Postgresql – ORDER BY clause kills query performance

order-byperformancepostgresqlpostgresql-10postgresql-performance

Context:

PostgreSQL 10, with 3667438 records in users table, the users table has a JSONB called social, we usually use a strategy of indexing computed function outputs, so we can aggregate information into a single index.
The output of the engagement(social) function it a double precision numeric type.

Problem:

The problematic clause is ORDER BY engagement(social) DESC NULLS LAST, there is also a btree index idx_in_social_engagement with DESC NULLS LAST attached to this data.

Fast query:

EXPLAIN ANALYZE
SELECT  "users".* FROM "users"
WHERE (follower_count(social) < 500000)
AND (engagement(social) > 0.03)
AND (engagement(social) < 0.25)
AND (peemv(social) < 533)
ORDER BY "users"."created_at" ASC
LIMIT 12 OFFSET 0;

Limit  (cost=0.43..52.25 rows=12 width=1333) (actual time=0.113..1.625 
rows=12 loops=1)
   ->  Index Scan using created_at_idx on users  (cost=0.43..7027711.55 rows=1627352 width=1333) (actual time=0.112..1.623 rows=12 loops=1)
         Filter: ((follower_count(social) < 500000) AND (engagement(social) > '0.03'::double precision) AND (engagement(social) <  '0.25'::double precision) AND (peemv(social) > '0'::double precision) AND (peemv(social) < '533'::double precision))
         Rows Removed by Filter: 8
 Planning time: 0.324 ms
 Execution time: 1.639 ms

Slow query:

EXPLAIN ANALYZE 
SELECT  "users".* FROM "users" 
WHERE (follower_count(social) < 500000) 
AND (engagement(social) > 0.03) 
AND (engagement(social) < 0.25) 
AND (peemv(social) > 0.0) 
AND (peemv(social) < 533) 
ORDER BY engagement(social) DESC NULLS LAST, "users"."created_at" ASC 
LIMIT 12 OFFSET 0;

Limit  (cost=2884438.00..2884438.03 rows=12 width=1341) (actual time=68011.728..68011.730 rows=12 loops=1)
->  Sort  (cost=2884438.00..2888506.38 rows=1627352 width=1341) (actual time=68011.727..68011.728 rows=12 loops=1)
        Sort Key: (engagement(social)) DESC NULLS LAST, created_at
        Sort Method: top-N heapsort  Memory: 45kB
        ->  Index Scan using idx_in_social_engagement on users  (cost=0.43..2847131.26 rows=1627352 width=1341) (actual time=0.082..67019.102 rows=1360633 loops=1)
            Index Cond: ((engagement(social) > '0.03'::double precision) AND (engagement(social) < '0.25'::double precision))
            Filter: ((follower_count(social) < 500000) AND (peemv(social) > '0'::double precision) AND (peemv(social) < '533'::double precision))
            Rows Removed by Filter: 85580
Planning time: 0.312 ms
Execution time: 68011.752 ms

The select goes with * because I need all the data stored in each row.

Update:

CREATE INDEX idx_in_social_engagement on influencers USING BTREE ( engagement(social) DESC NULLS LAST)

Exact index definition

Best Answer

Your ORDER BY clause is on:

engagement(social) DESC NULLS LAST, "users"."created_at" ASC

But I suspect your index is just on:

engagement(social) DESC NULLS LAST

So the index is not capable of fully supporting the ORDER BY.

You can reproduce the same issue without using either JSONB or expression indexes. You might be able to salvage the situation by creating a composite expression index over both columns in your ORDER BY.

If the PostgreSQL planner were infinitely wise, it might be able to use the existing index efficiently. It would have to march up engagement(social) DESC NULLS LAST until it collected 12 tuples which met all the rest of the filter requirements. Then it would have continue marching up that index until it collected all the rest of the tuples which were tied on engagement(social) with the 12th tuple (and which met the other criteria). Then it would have to re-sort all of the collected tuples on the full ORDER BY, and apply the LIMIT 12 to that expanded and re-sorted set. But the PostgreSQL planner is not infinitely wise.