I have found a clever way to do this query without a self join.
I ran these commands in MySQL 5.5.8 for Windows and got the following results:
use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');
SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;
+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John | 1 | Pooch |
| John | 2 | Duke,Duck |
| Sally | 1 | Pouch,Pooch |
| Sally | 2 | Peach |
+---------+-------------+---------------+
This display reveals that John gave two different answers to question 2 and Sally gave two different answers to question 1.
To catch which questions were answered differently by all users, just place the above query in a subquery and check for a comma in the list of given answers to get the count of distinct answers as follows:
SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;
I got this:
+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 1 | Pooch | 1 |
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
| Sally | 2 | Peach | 1 |
+---------+-------------+---------------+-------------------+
Now just filter out rows where multianswer_count = 1 using another subquery:
SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;
This is what I got:
+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
+---------+-------------+---------------+-------------------+
Essentially, I performed three table scans: 1 on the main table, 2 on the small subqueries. NO JOINS !!!
Give it a Try !!!
Best Answer
We have an application at our workplace that does a similar thing. It works by having a table that contains a list of all possible questions like such:
Then you have an ANSWERS and a QUESTIONAIRES table defined using the same structure above. Once you have these two tables you then define a table to hold the list of question/answer possibilties as such:
Once you have these you can then create a table to contain the responses as such:
This will give you maximum flexibility allowing you to add new questions and answers without having to change your database design frequently. It can get a bit complicated if you need to version the questions/answers but this should give you a good foothold to work from.
I hope this helps you.