I have multiple lookup tables and each need to be filtered by an array of values:
select
array_agg(joined_properties.name) as properties,
items.name
from items
left join (
select properties.name, item_property.item_id, item_property.property_id
from item_property
join properties on properties.id = item_property.property_id) as joined_properties on joined_properties.item_id = items.id
group by items.name;
properties | name |
---|---|
property1 | item2 |
property1,property2,property4 | item1 |
property2,property3 | item3 |
property2 | item5 |
property3,property4 | item4 |
I would like to filter each item by an array of properties, but return the unfiltered properties (just like above) in the same query.
What is the performant/correct way of doing it without needing to re-run the joins once I found matching rows? Whenever I filter inside the join
or outside it before grouping it only returns the matching rows of course:
select
array_agg(joined_properties.name) as properties,
items.name
from items
inner join (
select properties.name, item_property.item_id, item_property.property_id
from item_property
join properties on properties.id = item_property.property_id
where properties.name = ANY('{"property1","property2"}')
) as joined_properties on joined_properties.item_id = items.id
group by items.name;
properties | name |
---|---|
property1,property2 | item1 |
property1 | item2 |
property2 | item3 |
property2 | item5 |
Expected output:
properties | name |
---|---|
property1 | item2 |
property1,property2,property4 | item1 |
property2,property3 | item3 |
property2 | item5 |
Fiddle below:
Edit 1:
- Approximate hardware specs: 4-core Xeon Gold VM, 16GB RAM, SSD
- Relevant DB cardinalities:
- ~100k records in
items
, - 6
properties_i
text tables with their respective intermediary tables, - most
properties_i
tables range from 10-30 records, - largest
properties_i
will have 200 records indexed, - each item in
items
has 0-10 associated properties for anyproperty_i
- ~100k records in
Best Answer
The best solution heavily depends on data distribution and cardinalities - and available indexes.
This query should be generally good:
db<>fiddle here
The innermost subquery
ip0
identifies items (or rather:item_id
's) that have one of the filtered properties.The next subquery
sub
joins to all properties and aggregates per item - not involving tableitems
, yet!The outer query replaces
item_id
with actual item name to arrive at your desired result. If item names are not definedUNIQUE NOT NULL
, I would still returnitem_id
additionally to avoid ambiguities.Other query styles may be faster for particular data distributions / search arguments. Thee are many ways. I added one alternative with
IN
to the fiddle.Indexes
For big tables and selective filters, matching indexes make all the difference. For your given data distribution:
You don't need any indexes on
properties
as it's tiny.You'll already have a UNIQUE or PK constraint on
item_property (item_id, property_id)
. Add another index on the same two columns in reversed order, so(property_id, item_id)
in my example. Why? See:If
item_property
is vacuumed enough, you'll even see index-only scans.Obviously an index on
items(id)
- that will already be covered by the PK.