Can PostgreSQL Use Nulls in Indexes?

indexindex-tuningpostgresqlpostgresql-9.1

I've been reading this book which says that

The database assumes that Indexed_Col IS NOT NULL covers too large a range to be useful, so the database will not drive to an index from this condition.

I recognize that the book is more than 10 years old, but it has already proven quite useful–Using instructions gleaned from its pages, I've sped a query up by a factor of ten.

Further, in running EXPLAIN ANALYZE on a SELECT query, I've found that none of my indexes are being used, even when by all rights, they ought to be.

Thus, my question is:

Supposing there is a table which has a column, whose column definition includes "NOT NULL", and that an index exists which covers this column, would this index be used in a query of that table where the columns is part of the query?

Like:

CREATE TABLE my_table(
a varchar NOT NULL
);

CREATE INDEX ix_my_table ON my_table(a);

SELECT a from my_table;

Best Answer

PostgreSQL certainly can use an index for IS NOT NULL. I don't see any query planner assumptions about that condition, either.

If the null fraction for the column (pg_statistic.stanullfrac) is low enough to suggest that the index is usefully selective for the query, PostgreSQL will use an index.

I can't figure out what you're trying to say with:

If this is correct, is my understanding that an index on a column defined as "NOT NULL" not be used in a query which uses that column?

Certainly an index won't get used for an IS NOT NULL condition on a NOT NULL column. It'd always match 100% of rows, so a seqscan will almost always be much faster.

PostgreSQL won't use an index if the index doesn't filter out a large proportion of rows for a query. The only likely exception is when you're asking for a set of columns covered by a single index, in an order matching that of the index. PostgreSQL might do an index-only scan then. E.g. if there's an index on t(a, b, c) and you:

select a, b FROM t ORDER BY a, b, c;

PostgreSQL might use your index, even though no rows are filtered out, because it only has to read the index and can skip reading the heap, avoid doing a sort, etc.