PostgreSQL query performance varies depending on WHERE clause values

execution-planindexperformancepostgresqlpostgresql-performance

I'm having performance issues with a query depending on the user_id I use in the WHERE clause.

This question describes a very similar issue but not quite the same:

Here's my query:

select inlineview.user_search_id, inlineview.user_id, 
       to_char(timezone('UTC', to_timestamp(date_part('epoch', 
       inlineview.last_access_date))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ') 
       last_access_date,
       to_char(timezone('UTC', to_timestamp(date_part('epoch', 
       inlineview.sys_creation_date))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ') 
       sys_creation_date, b.product_id,
       to_char(timezone('UTC', to_timestamp(date_part('epoch', 
       b.last_prov_change))),'YYYY-MM-DD"T"HH24:MI:SS.MSZ') 
      last_prov_change,b.product_type,b.tlc_id,b.prod_history_id,
      coalesce(d.address_id,-1) address_id,coalesce(d.locality_name,'') 
      locality_name,
      coalesce(d.post_code_zone,'') post_code_zone,
      coalesce(d.road_name_concat,'') 
      road_name_concat,coalesce(d.street_num_concat,'') street_num_concat,
      coalesce(d.address_name,'') address_name,
      coalesce(d.town_name,'') town_name
from
    (select a.user_search_id, 
            a.user_id,a.last_access_date,
            a.sys_creation_date,a.product_id
     from linetest.user_search a
    where a.user_id = '818901'
    order by a.last_access_date desc limit 10) inlineview, 
 linetest.prod_history b left outer join linetest.address d on d.tlc_id = b.tlc_id 
 where b.product_id = inlineview.product_id
   and b.prod_history_id = (select c.prod_history_id 
                            from linetest.prod_history c
                            where c.product_id = b.product_id
                            group by c.prod_history_id
                            order by c.prod_history_id desc
                            limit 1)
order by 
    inlineview.last_access_date desc, 
    inlineview.user_search_id desc;

With this specific user_id ('818901') on the WHERE clause the query takes about 2 minutes to run. If I change the user_id value on the WHERE clause to any other value, the query runs instantly.

Now, the most intriguing part to me is the fact that the query plan for the 'problematic' user_id is exactly the same as for any other user_id.

Normal/expected query performance (0.5 seconds) running the query with any other value:

Sort  (cost=2522.41..2522.41 rows=1 width=136)
  Sort Key: a.last_access_date DESC, a.user_search_id DESC
  ->  Nested Loop Left Join  (cost=572.42..2522.40 rows=1 width=136)
        ->  Nested Loop  (cost=572.00..2517.96 rows=1 width=70)
              ->  Limit  (cost=571.57..571.60 rows=10 width=42)
                    ->  Sort  (cost=571.57..571.97 rows=159 width=42)
                          Sort Key: a.last_access_date DESC
                          ->  Bitmap Heap Scan on user_search a  (cost=5.66..568.14 rows=159 width=42)
                                Recheck Cond: ((user_id)::text = '601401'::text)
                                ->  Bitmap Index Scan on user_search_idx  (cost=0.00..5.62 rows=159 width=0)
                                      Index Cond: ((user_id)::text = '601401'::text)
              ->  Index Scan using prod_history_idx on prod_history b  (cost=0.42..194.62 rows=1 width=38)
                    Index Cond: ((product_id)::text = (a.product_id)::text)
                    Filter: (prod_history_id = (SubPlan 1))
                    SubPlan 1
                      ->  Limit  (cost=27.81..27.81 rows=1 width=8)
                            ->  Group  (cost=27.81..27.84 rows=6 width=8)
                                  Group Key: c.prod_history_id
                                  ->  Sort  (cost=27.81..27.82 rows=6 width=8)
                                        Sort Key: c.prod_history_id DESC
                                        ->  Index Scan using prod_history_idx on prod_history c  (cost=0.42..27.73 rows=6 width=8)
                                              Index Cond: ((product_id)::text = (b.product_id)::text)
        ->  Index Scan using address_idx on address d  (cost=0.42..4.41 rows=1 width=75)
              Index Cond: ((tlc_id)::text = (b.tlc_id)::text)

Slow query performance (130+ seconds) running with the problematic user_id:

Sort  (cost=2522.41..2522.41 rows=1 width=136)
  Sort Key: a.last_access_date DESC, a.user_search_id DESC
  ->  Nested Loop Left Join  (cost=572.42..2522.40 rows=1 width=136)
        ->  Nested Loop  (cost=572.00..2517.96 rows=1 width=70)
              ->  Limit  (cost=571.57..571.60 rows=10 width=42)
                    ->  Sort  (cost=571.57..571.97 rows=159 width=42)
                          Sort Key: a.last_access_date DESC
                          ->  Bitmap Heap Scan on user_search a  (cost=5.66..568.14 rows=159 width=42)
                                Recheck Cond: ((user_id)::text = '818901'::text)
                                ->  Bitmap Index Scan on user_search_idx  (cost=0.00..5.62 rows=159 width=0)
                                      Index Cond: ((user_id)::text = '818901'::text)
              ->  Index Scan using prod_history_idx on prod_history b  (cost=0.42..194.62 rows=1 width=38)
                    Index Cond: ((product_id)::text = (a.product_id)::text)
                    Filter: (prod_history_id = (SubPlan 1))
                    SubPlan 1
                      ->  Limit  (cost=27.81..27.81 rows=1 width=8)
                            ->  Group  (cost=27.81..27.84 rows=6 width=8)
                                  Group Key: c.prod_history_id
                                  ->  Sort  (cost=27.81..27.82 rows=6 width=8)
                                        Sort Key: c.prod_history_id DESC
                                        ->  Index Scan using prod_history_idx on prod_history c  (cost=0.42..27.73 rows=6 width=8)
                                              Index Cond: ((product_id)::text = (b.product_id)::text)
        ->  Index Scan using address_idx on address d  (cost=0.42..4.41 rows=1 width=75)
              Index Cond: ((tlc_id)::text = (b.tlc_id)::text)

I have played around with the indexes to try to force a change on the query plan, but in all tests the cost was the same for both values but the actual time to run the query is still quite different depending on the values.

Best Answer

General advice

in all tests the cost was the same for both values but the actual time to run the query is still quite different depending on the values.

That's typically the case if there are many more (or less) rows for the problematic user than for all the other users. Irregular data distribution.

In many cases Postgres could switch to a different query plan (like use a sequential scan on user_search instead of the bitmap index scan on user_search_idx we currently see. Hard to tell without more information. You might remove the index for testing and retry the expensive query to see if it gets faster. (However, with the suggested index below all cases should be covered.)

General performance advice applies.

In particular, your cost and autovacuum settings are crucial. Increasing the statistics target for linetest.user_search.user_id may help. See:

Your query

You have two subqueries with ORDER BY ... LIMIT n. These are particularly delicate constructs with irregular value frequencies. Including the one on linetest.user_search with where a.user_idf = '818901'. Related:

The most important piece of advice: Create a multicolumn index on (user_id, last_access_date desc) to match your query, and even your "fast" queries should be much faster, yet:

CREATE INDEX foo ON linetest.user_search (user_id, last_access_date desc)

If last_access_date isn't defined NOT NULL, you may want to add NULLS LAST in query and index. See:

And another one on:

CREATE INDEX bar ON linetest.prod_history (product_id, c.prod_history_id DESC)

Plus, you are mixing explicit and implicit joins in your FROM clause, which may not work in your favor. Compare:

Some other parts might be improved, too. Like the group by that adds cost for no gain. This equivalent query should be faster:

SELECT ...

FROM  (
   SELECT user_search_id, user_id, last_access_date, sys_creation_date, product_id
   FROM   linetest.user_search
   WHERE  user_id = '818901'        -- why the quotes? type?
   ORDER  BY last_access_date DESC  -- column is NOT NULL?
   LIMIT  10
   ) i
JOIN   linetest.prod_history b USING (product_id)
JOIN   LATERAL (
   SELECT prod_history_id 
   FROM   linetest.prod_history
   WHERE  product_id = b.product_id
   -- GROUP  BY prod_history_id     -- pointless!
   ORDER  BY prod_history_id DESC   -- make sure column is NOT NULL
   LIMIT  1
   ) c USING (prod_history_id)
LEFT   JOIN linetest.address d USING (tlc_id)
ORDER  BY i.last_access_date DESC, i.user_search_id DESC;