Postgresql – How to index refined null values

postgresqlpostgresql-9.2

I have table with two indexes:

  • latest_channel_snapshots_views_idx (view_count DESC NULLS LAST)
  • latest_channel_snapshots_network_views_idx (network_id, view_count DESC NULLS LAST)

What I want is to be able to sort by view_count overall, and within specific network. Postgres uses correct index for both of these cases. However, if i want to find record with most views where network_id is NULL, it uses the first index, and filters out the , thus performing really slow:

explain analyze SELECT *
FROM latest_channel_snapshots
WHERE network_id IS NULL
ORDER BY view_count DESC NULLS LAST
LIMIT 5 OFFSET 500000;

                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=39599.04..39599.44 rows=5 width=74) (actual time=1926.271..1926.287 rows=5 loops=1)
   ->  Index Scan using latest_channel_snapshots_views_idx on latest_channel_snapshots  (cost=0.00..42001.59 rows=530336 width=74) (actual time=0.060..1899.224 rows=500005 loops=1)
         Filter: (network_id IS NULL)
         Rows Removed by Filter: 305022
 Total runtime: 1926.309 ms
(5 rows)

Best Answer

You have two issues here:

  1. large offset: PG will need to scan through the provided offset number of rows and will prefer an index to do this above a table scan.

  2. how to index the data: I think a partial index might help you here. Can you try

    create index xxx_idx on latest_channel_snapshots(view_count DESC NULLS LAST) where network_id is null;

    See http://www.postgresql.org/docs/9.2/static/indexes-partial.html for more information on partial indexes. In particular, example 11-2 on unbilled orders seems to apply to your case.