This is the relational division problem and there is a question about it at SO, with a lot of ways to write this query, plus performance analysis for PostgreSQL: How to filter SQL results in a has-many-through relation
Shamelessly copying code form there and removing/changing code for answers that have features lacking from MySQL, like CTEs, EXCEPT
, INTERSECT
, etc, here are a few ways to do this.
Assumptions:
- the table is called
factors
- there is a
UNIQUE
constraint on (wordid, docid)
- there is a
documents
and a words
table:
Easy to write, medium efficiency:
-- Query 1 -- by Martin
SELECT d.docid, d.docname
FROM document d
JOIN factors f USING (docid)
WHERE f.wordid IN (2, 4, 5)
GROUP BY d.docid
HAVING COUNT(*) = 3 ; -- number of words
Easy to write, medium efficiency:
-- Query 2 -- by Erwin
SELECT d.docid, d.docname
FROM documents d
JOIN (
SELECT docid
FROM factors
WHERE wordid IN (2, 4, 5)
GROUP BY docid
HAVING COUNT(*) = 3
) f USING (docid) ;
More complex to write, very good efficiency in Postgres - probably lousy in MySQL:
-- Query 4 -- by Derek
SELECT d.docid, d.docname
FROM documents d
WHERE d.docid IN (SELECT docid FROM factors WHERE wordid = 2)
AND d.docid IN (SELECT docid FROM factors WHERE wordid = 4);
AND d.docid IN (SELECT docid FROM factors WHERE wordid = 5);
More complex to write, very good efficiency in Postgres - and probably the same in MySQL:
-- Query 5 -- by Erwin
SELECT d.docid, d.docname
FROM documents d
WHERE EXISTS (SELECT * FROM factors
WHERE docid = d.docid AND wordid = 2)
AND EXISTS (SELECT * FROM factors
WHERE docid = d.docid AND wordid = 4)
AND EXISTS (SELECT * FROM factors
WHERE docid = d.docid AND wordid = 5) ;
More complex to write, very good efficiency in Postgres - and probably the same in MySQL:
-- Query 6 -- by Sean
SELECT d.docid, d.docname
FROM documents d
JOIN factors x ON d.docid = x.docid
JOIN factors y ON d.docid = y.docid
JOIN factors z ON d.docid = z.docid
WHERE x.wordid = 2
AND y.wordid = 4
AND z.wordid = 5 ;
Easy to write and extend to an arbitrary set of words
but not as efficient as the JOIN
and EXISTS
solutions:
-- Query 7 -- by ypercube
SELECT d.docid, d.docname
FROM documents d
WHERE NOT EXISTS (
SELECT *
FROM words AS w
WHERE w.wordid IN (2, 4, 5)
AND NOT EXISTS (
SELECT *
FROM factors AS f
WHERE f.docid = d.docid
AND f.wordid = w.wordid
)
);
Easy to write, not good efficiency:
-- Query 8 -- by ypercube
SELECT d.docid, d.docname
FROM documents d
WHERE NOT EXISTS (
SELECT *
FROM (
SELECT 2 AS wordid UNION ALL
SELECT 4 UNION ALL
SELECT 5
) AS w
WHERE NOT EXISTS (
SELECT *
FROM factors AS f
WHERE f.docid = d.docid
AND f.wordid = w.wordid
)
);
Enjoy testing them :)
Best Answer
Please try this query:
http://sqlfiddle.com/#!2/ce1b0a/7