Sql-server – SQL Server Database Design for Reporting

sql server

We are designing a reporting solution for survey results. Although datasets are reasonable in size (rarely more than 500.000 respondents and 50 questions), performance is obviously a major concern.

Due to the nature of the solution, most queries return aggregated values and no locks are needed.

Storing answers in a "normal" tabular format (i.e. a column for each question and a row for each respondent) works well in terms of performance, and allows us to query the data like so:

SELECT COUNT(*) FROM Answers WHERE Gender = 'M' and Age < 20 

However, this design requires a new table for each survey as the questions (columns) differ, which is obviously not an ideal solution.

Therefore, we are considering a design where we store answer data in a table that would basically just hold a respondent ID, a question ID and an answer value, thereby "transposing" the data (i.e. there would be a row for each respondent/question combination in the Answers table).

In this design, we would have to use exists conditions (or joins) to filter our data, e.g:

SELECT COUNT(*) 
FROM Answers AS A1 
WHERE A1.QuestionID = 'Gender' AND A1.VALUE = 'M' AND EXISTS
( 
    SELECT * 
    FROM Answers AS A2 
    WHERE A2.RespondentID = A1.RespondentID AND A2.QuestionID = 'Age' AND A2.Value > 18
)

This would allow as to handle any survey without changing the database schema but we are concerned about what the impact might be on performance?

Or perhaps there is a better way to deal with this issue altogether?

Best Answer

We did some performance tests and found that - with scenarios that are typical for our application domain - it is 10-20 times faster to query data when stored in a regular "tabular" format.