Table design question for a `Check all that apply` survey

database-design

I'm looking for advice on a design for a portion of a table.

Background

We're collecting survey data for multiple sites, and one question is a check all that apply. Currently, there are from 2-5 available options based on the site, though that number could increase. When I get the data, I get anywhere from 2-5 columns (based on the number of options available on that survey), each column in the result set responds to one of the options. The data in the columns will all be either NULL or a text value representing that ordinal option for that particular survey. Different sites will also have different options from a set, so there is no guarantee that column 1 is always the same response. There are 13 total possible options to pick, at this point, and no one site has more than 5 available, but it's possible that we'll add more options to pick from, and that the there could be more than 5 options per site in the future.

Sample Data

Survey 1:

Col 1       Col 2     Col 3
A           B
            B
A                     C
A           B         C

Survey 2:

Col 1       Col 2     Col 3
B           D
            D         E
                      E
B                         

Survey 3:

Col 1       Col 2     Col 3
A           D
            D         F
A                     F
A                           

The question

How do I best store this data?

Currently

I get the data in a .CSV, and for expediency & urgent deadlines, I've been pasting it into an Excel spreadsheet. (That's why I'm looking to move it to an actual table…) I've created one column for each possible answer, and if the value in CSV.Col1 = Excel.TableHeader, then I put a 1 in that column. This allows me to quickly filter the Excel rows and sum the 1s for a quick total of how many there were. This is probably not ideal for an actual database.

Possibilities Considered

  1. I've considered assigning each of the 13 options a value and summing them to store them in a single column. For Example:
    • A = 1
    • B = 2
    • C = 4
    • D = 8
      So my Survey 1 example from above would store the values of:
    • 3
    • 2
    • 5
    • 6
  2. I've considered leaving it as it is currently with 13 columns of either Boolean, Number, or Text format. Each column would hold [True|False], [1|0], or [A|B|C|D...], as appropriate.
  3. Creating a 1:M dependent table to store a result per row, linked to the master table row for that survey. The child table would have a PK, a Foreign Key back to the parent table, and a text column to hold the response.

The back end

The data will be moved from Excel to Access, with the intent to move it into SQL Server as soon as possible.

Best Answer

Option 3 is classic 3NF design. If you need to add data to the table quickly, it will likely be the highest performing approach for inserts, so if you need to insert the data quickly and have high concurrency concerns that may be the way to go.

To me, it sounds like you're trying to solve an analytical problem. If that is the case, I'd go with option 2. This is a de-normalized approach. Your users are likely already thinking about their data in that way, so they'll appreciate the simplicity in it. With the way that you're receiving your data, it could be the most difficult to achieve, but the effort to get it in simply will be worth it. Your users will appreciate the elegance which is hopefully your goal.

Good Luck!