I have found a clever way to do this query without a self join.
I ran these commands in MySQL 5.5.8 for Windows and got the following results:
use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');
SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;
+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John | 1 | Pooch |
| John | 2 | Duke,Duck |
| Sally | 1 | Pouch,Pooch |
| Sally | 2 | Peach |
+---------+-------------+---------------+
This display reveals that John gave two different answers to question 2 and Sally gave two different answers to question 1.
To catch which questions were answered differently by all users, just place the above query in a subquery and check for a comma in the list of given answers to get the count of distinct answers as follows:
SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;
I got this:
+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 1 | Pooch | 1 |
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
| Sally | 2 | Peach | 1 |
+---------+-------------+---------------+-------------------+
Now just filter out rows where multianswer_count = 1 using another subquery:
SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;
This is what I got:
+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
+---------+-------------+---------------+-------------------+
Essentially, I performed three table scans: 1 on the main table, 2 on the small subqueries. NO JOINS !!!
Give it a Try !!!
Considering that the question is labeled database-design
I will offer a solution without that CSV field. Even if you decide to keep it, this may offer some ideas.
Question
contains all possible questions.
Answer
has all possible answers (to any question)
QuestionAnswer
contains all QA combinations offered, note that there must exist a default answer for "I do not know".
UserQA
contains questions user answered.
- Note that rule "There can be one and only one correct answer for each question" (IsCorrect = 'Y'), has to be implemented on the application level or as a stored procedure/trigger.
To get questions that a user with specific_UserId
has not yet answered
select qq.QuestionID
from Question as qq
where not exists (select 1
from UserQA as xx
where xx.QuestionID = qq.QuestionID
and xx.UserID = specific_UserId );
Or, better if there may be some new questions which do not have defined answers yet
select distinct qq.QuestionID
from QuestionAnswer as qq
where not exists (select 1
from UserQA as xx
where xx.QuestionID = qq.QuestionID
and xx.UserID = specific_UserId );
Best Answer
Using same stored procedure from previous answer, you can generate a new columns names, casting
ans_question_xx
with the name of each question.Rextester here