I have found a clever way to do this query without a self join.
I ran these commands in MySQL 5.5.8 for Windows and got the following results:
use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');
SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;
+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John | 1 | Pooch |
| John | 2 | Duke,Duck |
| Sally | 1 | Pouch,Pooch |
| Sally | 2 | Peach |
+---------+-------------+---------------+
This display reveals that John gave two different answers to question 2 and Sally gave two different answers to question 1.
To catch which questions were answered differently by all users, just place the above query in a subquery and check for a comma in the list of given answers to get the count of distinct answers as follows:
SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;
I got this:
+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 1 | Pooch | 1 |
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
| Sally | 2 | Peach | 1 |
+---------+-------------+---------------+-------------------+
Now just filter out rows where multianswer_count = 1 using another subquery:
SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;
This is what I got:
+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
+---------+-------------+---------------+-------------------+
Essentially, I performed three table scans: 1 on the main table, 2 on the small subqueries. NO JOINS !!!
Give it a Try !!!
Assuming question has just one correct answer and possible answers don't have to be different for each set of tests, you can do something like (it's a rough idea, I might under (or over) engineer some concepts):
Question : id, question_text, correct_answer_id (FK to answer.id)
Answer : id, answer_text
Question_possible_answer (question_id (FK to question), answer_id(FK to anwer))
Test : id, test_name
Test_question : id, test_id(FK to Test), Question (FK to Question)
QUESTIONNAIRE (id, customer_id, test_id, questionnaire_date) -- I assume that a customer may pass test more than once, so no unique key on (customer_id, test_id), but rather on
(customer_id, test_id,questionnaire_date)
QUESTIONNAIRE_Answer (questionnaire_id (FK to QUESTIONNAIRE), test_question_id(FK to Test_question), answer_id(FK to Answer))
Now for each question you add some possible answers in Question_possible_answer table, then you create a new test and populate Test_question table with the questions. You create a new QUESTIONNAIRE by linking test and customer, all answers from this customer go to QUESTIONNAIRE_Answer table.
Best Answer
Definitely do not hard code your questionnaire. Use a relational database or xml files. I propose the following tables
Questionnaire
: General description of questionnaire. Title, name of survey, questionnaire release date, version, and so on.Section
: The sections a questionnaire is made up. Number of the section, section title, description.Question
: The questions belonging to a section. Number of the question, question text, description, question type (text, multiple choice, etc.).Question_Choice
: The possible answers belonging to a question corresponding to the single checkboxes, radio buttons, and so on. Text of the choice, choice number, order.Respondent
: The persons answering the questions. Personal data, user number.Interview
: Interviews or tests or surveys (dependent on the nature of the questionnaire) belonging to one respondent and one questionnaire. If a respondent can always answer only one questionnaire (or if the survey is anonymous), this table is obsolete and can be merged with the Respondent table. Interview date (or test date or survey date), interviewer (if it applies).Answer
: Answers belonging to one interview (or respondent, see above) and one question. Answer text (for text type questions), choice (for radio buttons).Answer_Choice
: Choices belonging to one Answer and one Question_Choice when multiple choices can be checked.This is a very normalized approach; however, you could decide to concatenate choices into one string or to store them as bit pattern or simplify it in some other way depending on your needs.