PostgreSQL – Make IN Clause Behave Like AND Clause

postgresqlrelational-division

I've asked the same question on stackoverlow, but got no answer, so that's why I'm asking it here (I've read that simple sql questions should be asked at SO, but I have no choice).

Suppose I have 3 tables: posts, post_categories and categories. I'm implementing some kind of page with filters where I can filter posts by categories. User can select multiple categories. When he selects more than one, they should be summed. I can't get it working. What I have now is simple IN() SQL clause:

SELECT posts.id, post_categories.category_id FROM posts 
JOIN post_categories ON posts.id = post_categories.post_id 
WHERE post_categories.category_id IN (1,2,3) LIMIT 10;

But it is not matching all ids, it is OR and I need AND. What I need here is to find all posts that have categories with id=1 AND id=2 AND id=3, but instead this query returns posts that have at least one category from the IN() list.

Best Answer

The IN() operator is equivalent to a series of OR, which doesn't help you at all here.

Instead, I would build the query in a manner like this.

SELECT posts.id
FROM posts 
WHERE posts.id IN (SELECT post_id FROM post_categories WHERE category_id=1) AND
      posts.id IN (SELECT post_id FROM post_categories WHERE category_id=2) AND
      posts.id IN (SELECT post_id FROM post_categories WHERE category_id=3)

Admittedly, not a pretty construct, but I wrote it for readability. The following query might perform better:

SELECT posts.id
FROM posts
-- include:
WHERE posts.id IN (
        SELECT post_id
        FROM post_categories
        WHERE category_id IN (1, 2, 3)
        GROUP BY post_id
        HAVING COUNT(*)=3)
-- exclude:
    AND posts.id NOT IN (
        SELECT post_id
        FROM post_categories
        WHERE category_id NOT IN (1, 2, 3))

The second query assumes that the primary key of post_categories is (post_id, category_id). Since you haven't specified which rdbms you're running, you may have to tweak my code a bit to make it run.

Edit: The -- exclude: part eliminates posts that have any other category_id than 1, 2 or 3. You may want to skip this part depending on if you want to return a) all posts that have categories 1, 2 and 3 or b) all posts that have exactly categories 1, 2 and 3 and no other categories.