PostgreSQL Optimization – Best Way to Select a Matching Subset of Rows

arraymany-to-manyoptimizationpostgresql

DB Fiddle link. I have a many to many table like this:

CREATE TABLE house_to_cats (
  id SERIAL PRIMARY KEY,
  house_id INTEGER,
  cat_id INTEGER
);

-- house with cats 1 and 2: too small
INSERT INTO house_to_cats (house_id, cat_id) VALUES (1, 1), (1, 2);
-- house with cats 1 2 3 4: too big
INSERT INTO house_to_cats (house_id, cat_id) VALUES (2, 1), (2, 2), (2, 3), (2, 4);
-- house with cats 1 2 3: just right
INSERT INTO house_to_cats (house_id, cat_id) VALUES (3, 1), (3, 2), (3, 3);

I need a query that takes an arbitrary list of cats and returns a matching house if exists. I came up with this:

SELECT
    house_id
FROM (
    SELECT
          house_id
        , ARRAY_AGG(cat_id) as cat_id_agg
    FROM house_to_cats
    JOIN (
        SELECT DISTINCT
              house_id
        FROM house_to_cats
        JOIN (SELECT * FROM UNNEST(ARRAY[1, 2, 3]) cat_id) inn USING (cat_id)
    ) filter USING (house_id)
    GROUP BY house_id
) agg
WHERE cat_id_agg <@ ARRAY[1, 2, 3]
  AND cat_id_agg @> ARRAY[1, 2, 3];

Is there a better way to do this?

The idea behind my query: in filter, get the house_id which have at least one of our cat in them. In agg, create cat_id_agg arrays for all of those house_ids. And in the outermost query filter out groups that don't match our set.

Best Answer

If I understood you correctly, your query can be simplified to:

select house_id
from house_to_cats
group by house_id
having array_agg(cat_id order by cat_id) = array[1,2,3]

Note the order by in the array_agg() call - the array [3,2,1] is not equal the array [1,2,3]. To avoid incorrect results due to the aggregation being done in a different order, the aggregated array has to contain the values in the same order as the comparison value.

Online example: https://rextester.com/IKGHWL59301