I have a relatively complex query like the one that follows
select * from (
select
distinct on("car")
entity.id as "car",
metric as "last_visit"
from now() ref_date
cross join (values (142), (432), (583), (448), (381)) entity(id)
left join (
select
gv.car_id as _entity_id,
gv.start as _tc_past,
gv.start as _keep_max_of,
g.name as metric
from garage_visits gv
left join garage_tree_flat gtf on gtf.descendant_id = gv.garage_id
left join garage g on g.id = gtf.parent_id
where true
and price != 0
and g.type = 2
) x
on x._entity_id = entity.id
and _tc_past < ref_date
and _tc_past >= ref_date - interval '12 weeks'
order by "car", _keep_max_of desc
) t
where t."last_visit" is not null;
My problem is that the query running time will vary tremendously based on the number of elements that are present in cross join (values (...))
.
Namely, I have the following average running times depending on how many values
there are:
# of car_id values | avg running time |
---|---|
1 | 25ms |
2 | 30ms |
3 | 35ms |
4 | 40ms |
5 | 4900ms |
6 | 4900ms |
7 | 4900ms |
Checking the execution plan for both versions (up to 3 values or more than 3 values) shows that the culprit is the index scan on garage_visits gv
. This table contains several million rows.
In the case of up to 3 car_ids
, the index condition on garage_visits
is as follows:
Node Type: Index Scan
Index Name: ix_gv_car_id
Index Cond: (gv.car_id = "*VALUES*".column1)
Actual Duration: 0.687ms
In the case of more than 3 car_ids
, the index condition on garage_visits
is as follows:
Node Type: Index Scan
Index Name: ix_gv_garage_id
Index Cond: (gv.garage_id = gtf.descendant_id)
Actual Duration: 3386.068ms
As we can observe, with up to three values, the query planner decides to filter garage_visits
by car_id
, and outputs only these rows, which is extremely fast. But with more than three car_id values, the planner decides against using this information and instead filters garage_visits
by gtf.descendant_id
instead, which dramatically changes the execution time because there are many more gtf.descendant_id
values than there are car_ids
values.
Is there a way to keep using the ix_gv_car_id
index but with more than 3 car_id
values? It looks like it's simply a wrong choice here from the query planner and it's frustrating to see that this query could run really fast (even for 30 or 50 values) but instead spends a lot of wasted time.
Best Answer
You can force it not use that index by rewriting the join condition to
gtf.descendant_id = gv.garage_id + 0
. Whether it will then switch to the good plan, or just switch to a different bad plan is not something we can predict. You will have to try it and see.If you want use to analyze why it is making the wrong choice, you will have to show us the complete plans, not just your interpretations of them.