Filter table lookup by array value, but returning groupby unfiltered

many-to-manypostgresqlpostgresql-13postgresql-performance

I have multiple lookup tables and each need to be filtered by an array of values:

select 
array_agg(joined_properties.name) as properties,
items.name
from items
left join (
  select properties.name, item_property.item_id, item_property.property_id
  from item_property
  join properties on properties.id = item_property.property_id) as joined_properties on joined_properties.item_id = items.id
group by items.name;

properties name
property1 item2
property1,property2,property4 item1
property2,property3 item3
property2 item5
property3,property4 item4

I would like to filter each item by an array of properties, but return the unfiltered properties (just like above) in the same query.
What is the performant/correct way of doing it without needing to re-run the joins once I found matching rows? Whenever I filter inside the join or outside it before grouping it only returns the matching rows of course:

select 
array_agg(joined_properties.name) as properties,
items.name
from items
inner join (
  select properties.name, item_property.item_id, item_property.property_id
  from item_property
  join properties on properties.id = item_property.property_id
  where properties.name = ANY('{"property1","property2"}')
) as joined_properties on joined_properties.item_id = items.id
group by items.name;
properties name
property1,property2 item1
property1 item2
property2 item3
property2 item5

Expected output:

properties name
property1 item2
property1,property2,property4 item1
property2,property3 item3
property2 item5

Fiddle below:

View on DB Fiddle

Edit 1:

  • Approximate hardware specs: 4-core Xeon Gold VM, 16GB RAM, SSD
  • Relevant DB cardinalities:
    • ~100k records in items,
    • 6 properties_i text tables with their respective intermediary tables,
    • most properties_i tables range from 10-30 records,
    • largest properties_i will have 200 records indexed,
    • each item in items has 0-10 associated properties for any property_i

Best Answer

The best solution heavily depends on data distribution and cardinalities - and available indexes.

This query should be generally good:


SELECT sub.properties, i.name AS item
FROM  (
   SELECT item_id, array_agg(p.name) AS properties
   FROM  (
      SELECT DISTINCT item_id
      FROM   item_property
      WHERE  property_id = ANY (
         SELECT id FROM properties
         WHERE  name = ANY ('{"property1","property2"}')
         )
      ) ip0
   JOIN   item_property ip USING (item_id)
   JOIN   properties    p  ON p.id = ip.property_id
   GROUP  BY item_id
   ) sub
JOIN   items i ON i.id = sub.item_id
ORDER  BY i.name;  -- my optional addition

db<>fiddle here

The innermost subquery ip0 identifies items (or rather: item_id's) that have one of the filtered properties.
The next subquery sub joins to all properties and aggregates per item - not involving table items, yet!
The outer query replaces item_id with actual item name to arrive at your desired result. If item names are not defined UNIQUE NOT NULL, I would still return item_id additionally to avoid ambiguities.

Other query styles may be faster for particular data distributions / search arguments. Thee are many ways. I added one alternative with IN to the fiddle.

Indexes

For big tables and selective filters, matching indexes make all the difference. For your given data distribution:

  • You don't need any indexes on properties as it's tiny.

  • You'll already have a UNIQUE or PK constraint on item_property (item_id, property_id). Add another index on the same two columns in reversed order, so (property_id, item_id) in my example. Why? See:

    If item_property is vacuumed enough, you'll even see index-only scans.

  • Obviously an index on items(id) - that will already be covered by the PK.