Context:
PostgreSQL 10, with 3667438 records in users table, the users table has a JSONB called social, we usually use a strategy of indexing computed function outputs, so we can aggregate information into a single index.
The output of the engagement(social)
function it a double precision numeric type.
Problem:
The problematic clause is ORDER BY engagement(social) DESC NULLS LAST
, there is also a btree index idx_in_social_engagement with DESC NULLS LAST
attached to this data.
Fast query:
EXPLAIN ANALYZE
SELECT "users".* FROM "users"
WHERE (follower_count(social) < 500000)
AND (engagement(social) > 0.03)
AND (engagement(social) < 0.25)
AND (peemv(social) < 533)
ORDER BY "users"."created_at" ASC
LIMIT 12 OFFSET 0;
Limit (cost=0.43..52.25 rows=12 width=1333) (actual time=0.113..1.625
rows=12 loops=1)
-> Index Scan using created_at_idx on users (cost=0.43..7027711.55 rows=1627352 width=1333) (actual time=0.112..1.623 rows=12 loops=1)
Filter: ((follower_count(social) < 500000) AND (engagement(social) > '0.03'::double precision) AND (engagement(social) < '0.25'::double precision) AND (peemv(social) > '0'::double precision) AND (peemv(social) < '533'::double precision))
Rows Removed by Filter: 8
Planning time: 0.324 ms
Execution time: 1.639 ms
Slow query:
EXPLAIN ANALYZE
SELECT "users".* FROM "users"
WHERE (follower_count(social) < 500000)
AND (engagement(social) > 0.03)
AND (engagement(social) < 0.25)
AND (peemv(social) > 0.0)
AND (peemv(social) < 533)
ORDER BY engagement(social) DESC NULLS LAST, "users"."created_at" ASC
LIMIT 12 OFFSET 0;
Limit (cost=2884438.00..2884438.03 rows=12 width=1341) (actual time=68011.728..68011.730 rows=12 loops=1)
-> Sort (cost=2884438.00..2888506.38 rows=1627352 width=1341) (actual time=68011.727..68011.728 rows=12 loops=1)
Sort Key: (engagement(social)) DESC NULLS LAST, created_at
Sort Method: top-N heapsort Memory: 45kB
-> Index Scan using idx_in_social_engagement on users (cost=0.43..2847131.26 rows=1627352 width=1341) (actual time=0.082..67019.102 rows=1360633 loops=1)
Index Cond: ((engagement(social) > '0.03'::double precision) AND (engagement(social) < '0.25'::double precision))
Filter: ((follower_count(social) < 500000) AND (peemv(social) > '0'::double precision) AND (peemv(social) < '533'::double precision))
Rows Removed by Filter: 85580
Planning time: 0.312 ms
Execution time: 68011.752 ms
The select goes with * because I need all the data stored in each row.
Update:
CREATE INDEX idx_in_social_engagement on influencers USING BTREE ( engagement(social) DESC NULLS LAST)
Exact index definition
Best Answer
Your
ORDER BY
clause is on:But I suspect your index is just on:
So the index is not capable of fully supporting the
ORDER BY
.You can reproduce the same issue without using either
JSONB
or expression indexes. You might be able to salvage the situation by creating a composite expression index over both columns in yourORDER BY
.If the PostgreSQL planner were infinitely wise, it might be able to use the existing index efficiently. It would have to march up
engagement(social) DESC NULLS LAST
until it collected 12 tuples which met all the rest of the filter requirements. Then it would have continue marching up that index until it collected all the rest of the tuples which were tied onengagement(social)
with the 12th tuple (and which met the other criteria). Then it would have to re-sort all of the collected tuples on the fullORDER BY
, and apply theLIMIT 12
to that expanded and re-sorted set. But the PostgreSQL planner is not infinitely wise.