Mysql – Matching single column against multiple values without self-joining table in MySQL

database-designjoin;MySQL

We have a table that we use to store answers to questions. We need to be able to find users that have certain answers to particular questions. So, if our table consists of the following data:

user_id     question_id     answer_value  
Sally        1               Pooch  
Sally        2               Peach  
John         1               Pooch  
John         2               Duke

and we want to find users who answer 'Pooch' for question 1 and 'Peach' for question 2, the following SQL will (obviously) not worK:

select user_id 
from answers 
where question_id=1 
  and answer_value = 'Pooch'
  and question_id=2
  and answer_value='Peach'

My first thought was to self-join the table for each answer we are looking for:

select a.user_id 
from answers a, answers b 
where a.user_id = b.user_id
  and a.question_id=1
  and a.answer_value = 'Pooch'
  and b.question_id=2
  and b.answer_value='Peach'

This works, but since we allow an arbitrary number of search filters, we need to find something much more efficient. My next solution was something like this:

select user_id, count(question_id) 
from answers 
where (
       (question_id=2 and answer_value = 'Peach') 
    or (question_id=1 and answer_value = 'Pooch')
      )
group by user_id 
having count(question_id)>1

However, we want users to be able to take the same questionnaire twice, so they could potentially have two answers to question 1 in the answers table.

So, now I'm at a loss. What's the best way to approach this? Thanks!

Best Answer

I have found a clever way to do this query without a self join.

I ran these commands in MySQL 5.5.8 for Windows and got the following results:

use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');

SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;

+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John    |           1 | Pooch         |
| John    |           2 | Duke,Duck     |
| Sally   |           1 | Pouch,Pooch   |
| Sally   |           2 | Peach         |
+---------+-------------+---------------+

This display reveals that John gave two different answers to question 2 and Sally gave two different answers to question 1.

To catch which questions were answered differently by all users, just place the above query in a subquery and check for a comma in the list of given answers to get the count of distinct answers as follows:

SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;

I got this:

+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John    |           1 | Pooch         |                 1 |
| John    |           2 | Duke,Duck     |                 2 |
| Sally   |           1 | Pouch,Pooch   |                 2 |
| Sally   |           2 | Peach         |                 1 |
+---------+-------------+---------------+-------------------+

Now just filter out rows where multianswer_count = 1 using another subquery:

SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;

This is what I got:

+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John    |           2 | Duke,Duck     |                 2 |
| Sally   |           1 | Pouch,Pooch   |                 2 |
+---------+-------------+---------------+-------------------+

Essentially, I performed three table scans: 1 on the main table, 2 on the small subqueries. NO JOINS !!!

Give it a Try !!!