I'm having performance issues with a query depending on the user_id
I use in the WHERE
clause.
This question describes a very similar issue but not quite the same:
Here's my query:
select inlineview.user_search_id, inlineview.user_id,
to_char(timezone('UTC', to_timestamp(date_part('epoch',
inlineview.last_access_date))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ')
last_access_date,
to_char(timezone('UTC', to_timestamp(date_part('epoch',
inlineview.sys_creation_date))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ')
sys_creation_date, b.product_id,
to_char(timezone('UTC', to_timestamp(date_part('epoch',
b.last_prov_change))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ')
last_prov_change,b.product_type,b.tlc_id,b.prod_history_id,
coalesce(d.address_id,-1) address_id,coalesce(d.locality_name,'')
locality_name,
coalesce(d.post_code_zone,'') post_code_zone,
coalesce(d.road_name_concat,'')
road_name_concat,coalesce(d.street_num_concat,'') street_num_concat,
coalesce(d.address_name,'') address_name,
coalesce(d.town_name,'') town_name
from
(select a.user_search_id,
a.user_id,a.last_access_date,
a.sys_creation_date,a.product_id
from linetest.user_search a
where a.user_id = '818901'
order by a.last_access_date desc limit 10) inlineview,
linetest.prod_history b left outer join linetest.address d on d.tlc_id = b.tlc_id
where b.product_id = inlineview.product_id
and b.prod_history_id = (select c.prod_history_id
from linetest.prod_history c
where c.product_id = b.product_id
group by c.prod_history_id
order by c.prod_history_id desc
limit 1)
order by
inlineview.last_access_date desc,
inlineview.user_search_id desc;
With this specific user_id
('818901') on the WHERE
clause the query takes about 2 minutes to run. If I change the user_id
value on the WHERE
clause to any other value, the query runs instantly.
Now, the most intriguing part to me is the fact that the query plan for the 'problematic' user_id
is exactly the same as for any other user_id
.
Normal/expected query performance (0.5 seconds) running the query with any other value:
Sort (cost=2522.41..2522.41 rows=1 width=136)
Sort Key: a.last_access_date DESC, a.user_search_id DESC
-> Nested Loop Left Join (cost=572.42..2522.40 rows=1 width=136)
-> Nested Loop (cost=572.00..2517.96 rows=1 width=70)
-> Limit (cost=571.57..571.60 rows=10 width=42)
-> Sort (cost=571.57..571.97 rows=159 width=42)
Sort Key: a.last_access_date DESC
-> Bitmap Heap Scan on user_search a (cost=5.66..568.14 rows=159 width=42)
Recheck Cond: ((user_id)::text = '601401'::text)
-> Bitmap Index Scan on user_search_idx (cost=0.00..5.62 rows=159 width=0)
Index Cond: ((user_id)::text = '601401'::text)
-> Index Scan using prod_history_idx on prod_history b (cost=0.42..194.62 rows=1 width=38)
Index Cond: ((product_id)::text = (a.product_id)::text)
Filter: (prod_history_id = (SubPlan 1))
SubPlan 1
-> Limit (cost=27.81..27.81 rows=1 width=8)
-> Group (cost=27.81..27.84 rows=6 width=8)
Group Key: c.prod_history_id
-> Sort (cost=27.81..27.82 rows=6 width=8)
Sort Key: c.prod_history_id DESC
-> Index Scan using prod_history_idx on prod_history c (cost=0.42..27.73 rows=6 width=8)
Index Cond: ((product_id)::text = (b.product_id)::text)
-> Index Scan using address_idx on address d (cost=0.42..4.41 rows=1 width=75)
Index Cond: ((tlc_id)::text = (b.tlc_id)::text)
Slow query performance (130+ seconds) running with the problematic user_id
:
Sort (cost=2522.41..2522.41 rows=1 width=136)
Sort Key: a.last_access_date DESC, a.user_search_id DESC
-> Nested Loop Left Join (cost=572.42..2522.40 rows=1 width=136)
-> Nested Loop (cost=572.00..2517.96 rows=1 width=70)
-> Limit (cost=571.57..571.60 rows=10 width=42)
-> Sort (cost=571.57..571.97 rows=159 width=42)
Sort Key: a.last_access_date DESC
-> Bitmap Heap Scan on user_search a (cost=5.66..568.14 rows=159 width=42)
Recheck Cond: ((user_id)::text = '818901'::text)
-> Bitmap Index Scan on user_search_idx (cost=0.00..5.62 rows=159 width=0)
Index Cond: ((user_id)::text = '818901'::text)
-> Index Scan using prod_history_idx on prod_history b (cost=0.42..194.62 rows=1 width=38)
Index Cond: ((product_id)::text = (a.product_id)::text)
Filter: (prod_history_id = (SubPlan 1))
SubPlan 1
-> Limit (cost=27.81..27.81 rows=1 width=8)
-> Group (cost=27.81..27.84 rows=6 width=8)
Group Key: c.prod_history_id
-> Sort (cost=27.81..27.82 rows=6 width=8)
Sort Key: c.prod_history_id DESC
-> Index Scan using prod_history_idx on prod_history c (cost=0.42..27.73 rows=6 width=8)
Index Cond: ((product_id)::text = (b.product_id)::text)
-> Index Scan using address_idx on address d (cost=0.42..4.41 rows=1 width=75)
Index Cond: ((tlc_id)::text = (b.tlc_id)::text)
I have played around with the indexes to try to force a change on the query plan, but in all tests the cost was the same for both values but the actual time to run the query is still quite different depending on the values.
Best Answer
General advice
That's typically the case if there are many more (or less) rows for the problematic user than for all the other users. Irregular data distribution.
In many cases Postgres could switch to a different query plan (like use a sequential scan on
user_search
instead of the bitmap index scan onuser_search_idx
we currently see. Hard to tell without more information. You might remove the index for testing and retry the expensive query to see if it gets faster. (However, with the suggested index below all cases should be covered.)General performance advice applies.
In particular, your cost and autovacuum settings are crucial. Increasing the statistics target for
linetest.user_search.user_id
may help. See:Your query
You have two subqueries with
ORDER BY ... LIMIT n
. These are particularly delicate constructs with irregular value frequencies. Including the one onlinetest.user_search
withwhere a.user_idf = '818901'
. Related:The most important piece of advice: Create a multicolumn index on
(user_id, last_access_date desc)
to match your query, and even your "fast" queries should be much faster, yet:If
last_access_date
isn't definedNOT NULL
, you may want to addNULLS LAST
in query and index. See:And another one on:
Plus, you are mixing explicit and implicit joins in your
FROM
clause, which may not work in your favor. Compare:Some other parts might be improved, too. Like the
group by
that adds cost for no gain. This equivalent query should be faster: