Postgresql – Confused about indexing for IS NOT NULL

indexpostgresql

I have a PostgreSQL table with a timestamp field processed. I run a lot of queries like this:

SELECT blah1, blah2 FROM tbl WHERE processed IS NOT NULL

According to this question, I can create a partial index like this to speed this up:

CREATE INDEX idx_tbl_processed
    ON tbl (processed)
    WHERE processed IS NOT NULL;

My question is, with this partial index, wouldn't the database have to look up rows with processed not null (which is not fully indexed, so must be slow) before it gets to the indexed part of the table (which in this case is processed itself!)?

In general, does a partial index conditioned on some_column make queries on some_column itself faster?

Best Answer

My question is, with this partial index, wouldn't the database have to look up rows with processed not null (which is not fully indexed, so must be slow) before it gets to the indexed part of the table (which in this case is processed itself!)?

No, an index on IS NOT NULL has no need in itself to visit the table on queries restricted to IS NOT NULL or on queries which can be determined to be not-null. In fact, on queries restricted to the column where the value IS NOT NULL, you could potentially be doing an INDEX ONLY SCAN which I demonstrate here.

CREATE TABLE foo AS
  SELECT x::int, null::int AS bar
  FROM generate_series(1,1e6)
    AS gs(x);
UPDATE foo
  SET bar = 42
  WHERE x = 185093;


CREATE INDEX ON foo (bar)
  WHERE bar IS NOT NULL;

VACUUM ANALYZE foo;

EXPLAIN ANALYZE SELECT * FROM foo WHERE bar = 42;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Index Scan using foo_bar_idx on foo  (cost=0.12..8.14 rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1)
   Index Cond: (bar = 42)
 Planning time: 0.295 ms
 Execution time: 0.047 ms
(4 rows)