Database Design for Questions and Answers

database-designnormalization

I'm building a website that will host a variety of questionnaires. Each questionnaire has a different number of questions and each question has a different number of answers. I have attempted to design a database to hold the questions and possible answers for each questionnaire, but I end up having separate tables for each question. Is this correct or am I going wrong somewhere?

For example

Table for question x

  Answer  | Answer ID
       1         019
       2         089

I cant have a fixed size table for all questions as I don't have a maximum amount of answers. This obviously means I could end up with hundreds of tables, one for each question.

Best Answer

We have an application at our workplace that does a similar thing. It works by having a table that contains a list of all possible questions like such:

CREATE TABLE QUESTIONS
(
   ID INT NOT NULL PRIMARY KEY,
   SUMMARY NVARCHAR(64) NOT NULL UNIQUE,
   DESCRIPTION NVARCHAR(255) NULL
);

Then you have an ANSWERS and a QUESTIONAIRES table defined using the same structure above. Once you have these two tables you then define a table to hold the list of question/answer possibilties as such:

CREATE TABLE QUESTION_ANSWERS
(
   ID INT NOT NULL PRIMARY KEY,
   QUESTION INT NOT NULL REFERENCES QUESTIONS(ID),
   ANSWER INT NOT NULL REFERENCES ANSWERS(ID)
);

Once you have these you can then create a table to contain the responses as such:

CREATE TABLE RESPONSES
(
   QUESTIONAIRE INT NOT NULL REFERENCES QUESTIONAIRES(ID),
   RESPONSE INT NOT NULL REFERENCES QUESTION_ANSWERS(ID)
);

This will give you maximum flexibility allowing you to add new questions and answers without having to change your database design frequently. It can get a bit complicated if you need to version the questions/answers but this should give you a good foothold to work from.

I hope this helps you.

Related Solutions

Mysql – Matching single column against multiple values without self-joining table in MySQL

I have found a clever way to do this query without a self join.

I ran these commands in MySQL 5.5.8 for Windows and got the following results:

use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');

SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;

+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John    |           1 | Pooch         |
| John    |           2 | Duke,Duck     |
| Sally   |           1 | Pouch,Pooch   |
| Sally   |           2 | Peach         |
+---------+-------------+---------------+

This display reveals that John gave two different answers to question 2 and Sally gave two different answers to question 1.

To catch which questions were answered differently by all users, just place the above query in a subquery and check for a comma in the list of given answers to get the count of distinct answers as follows:

SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;

I got this:

+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John    |           1 | Pooch         |                 1 |
| John    |           2 | Duke,Duck     |                 2 |
| Sally   |           1 | Pouch,Pooch   |                 2 |
| Sally   |           2 | Peach         |                 1 |
+---------+-------------+---------------+-------------------+

Now just filter out rows where multianswer_count = 1 using another subquery:

SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;

This is what I got:

+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John    |           2 | Duke,Duck     |                 2 |
| Sally   |           1 | Pouch,Pooch   |                 2 |
+---------+-------------+---------------+-------------------+

Essentially, I performed three table scans: 1 on the main table, 2 on the small subqueries. NO JOINS !!!

Give it a Try !!!

Questionnaire database design – which way is better

Definitely do not hard code your questionnaire. Use a relational database or xml files. I propose the following tables

Questionnaire: General description of questionnaire. Title, name of survey, questionnaire release date, version, and so on.
Section: The sections a questionnaire is made up. Number of the section, section title, description.
Question: The questions belonging to a section. Number of the question, question text, description, question type (text, multiple choice, etc.).
Question_Choice: The possible answers belonging to a question corresponding to the single checkboxes, radio buttons, and so on. Text of the choice, choice number, order.
Respondent: The persons answering the questions. Personal data, user number.
Interview: Interviews or tests or surveys (dependent on the nature of the questionnaire) belonging to one respondent and one questionnaire. If a respondent can always answer only one questionnaire (or if the survey is anonymous), this table is obsolete and can be merged with the Respondent table. Interview date (or test date or survey date), interviewer (if it applies).
Answer: Answers belonging to one interview (or respondent, see above) and one question. Answer text (for text type questions), choice (for radio buttons).
Answer_Choice: Choices belonging to one Answer and one Question_Choice when multiple choices can be checked.

This is a very normalized approach; however, you could decide to concatenate choices into one string or to store them as bit pattern or simplify it in some other way depending on your needs.

Best Answer

Related Solutions

Mysql – Matching single column against multiple values without self-joining table in MySQL

Questionnaire database design – which way is better

Related Question