I have table with two indexes:
- latest_channel_snapshots_views_idx (view_count DESC NULLS LAST)
- latest_channel_snapshots_network_views_idx (network_id, view_count DESC NULLS LAST)
What I want is to be able to sort by view_count overall, and within specific network. Postgres uses correct index for both of these cases. However, if i want to find record with most views where network_id is NULL, it uses the first index, and filters out the , thus performing really slow:
explain analyze SELECT *
FROM latest_channel_snapshots
WHERE network_id IS NULL
ORDER BY view_count DESC NULLS LAST
LIMIT 5 OFFSET 500000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=39599.04..39599.44 rows=5 width=74) (actual time=1926.271..1926.287 rows=5 loops=1)
-> Index Scan using latest_channel_snapshots_views_idx on latest_channel_snapshots (cost=0.00..42001.59 rows=530336 width=74) (actual time=0.060..1899.224 rows=500005 loops=1)
Filter: (network_id IS NULL)
Rows Removed by Filter: 305022
Total runtime: 1926.309 ms
(5 rows)
Best Answer
You have two issues here:
large offset: PG will need to scan through the provided offset number of rows and will prefer an index to do this above a table scan.
how to index the data: I think a partial index might help you here. Can you try
create index xxx_idx on latest_channel_snapshots(view_count DESC NULLS LAST) where network_id is null;
See http://www.postgresql.org/docs/9.2/static/indexes-partial.html for more information on partial indexes. In particular, example 11-2 on unbilled orders seems to apply to your case.