PostgreSQL – Impact of NULL Ordering on Btree Scan

database-internalsindexnullorder-bypostgresql

In this answer jjanes says that NULL ORDERING contrary to the index declaration ordering can force a quick scan. Turns out he's right. I want to know why that is though. If you a btree with nulls last, or first why can't you skip over them?

CREATE TABLE foo AS
  SELECT x::int
  FROM generate_series(1,1e6) AS gs(x);

CREATE INDEX ON foo(x DESC NULLS FIRST);

ANALYZE foo;

With NULLS LAST,

SELECT * FROM foo WHERE x BETWEEN 100 AND 5000 ORDER BY x DESC NULLS LAST;

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=414.95..426.19 rows=4494 width=4) (actual time=2.660..3.147 rows=4901 loops=1)
   Sort Key: x DESC NULLS LAST
   Sort Method: quicksort  Memory: 422kB
   ->  Index Only Scan using foo_x_idx on foo  (cost=0.42..142.31 rows=4494 width=4) (actual time=0.031..1.522 rows=4901 loops=1)
         Index Cond: ((x >= 100) AND (x <= 5000))
         Heap Fetches: 0
 Planning time: 0.265 ms
 Execution time: 3.633 ms
(8 rows)

Without NULLS LAST,

SELECT * FROM foo WHERE x BETWEEN 100 AND 5000 ORDER BY x DESC;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using foo_x_idx on foo  (cost=0.42..142.31 rows=4494 width=4) (actual time=0.040..1.611 rows=4901 loops=1)
   Index Cond: ((x >= 100) AND (x <= 5000))
   Heap Fetches: 0
 Planning time: 0.192 ms
 Execution time: 2.021 ms
(5 rows)

It does seem NULL ordering contrary to index declaration forces quickscan rather than index-only. Though I have no idea why.

It even happens if the column is NOT NULL,

ALTER TABLE foo ALTER COLUMN x SET NOT NULL;
VACUUM ANALYZE foo;
SELECT * FROM foo WHERE x BETWEEN 100 AND 5000 ORDER BY x DESC NULLS LAST;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=477.64..490.43 rows=5113 width=4) (actual time=2.681..3.173 rows=4901 loops=1)
   Sort Key: x DESC NULLS LAST
   Sort Method: quicksort  Memory: 422kB
   ->  Index Only Scan using foo_x_idx on foo  (cost=0.42..162.69 rows=5113 width=4) (actual time=0.029..1.506 rows=4901 loops=1)
         Index Cond: ((x >= 100) AND (x <= 5000))
         Heap Fetches: 0
 Planning time: 0.225 ms
 Execution time: 3.662 ms
(8 rows)

Best Answer

I asked the question in irc, it seems that they just never did it.

< johto> EvanCarroll; "because nobody implemented it" is usually the answer
< macdice> EvanCarroll: we absolutely could implement that and it's been discussed in this channel.  the codez have not been forthcoming
< macdice> an index scan that is smart enough to skip null (start after null) and then skip back to the nulls at the end
< macdice> probably the easiest of many potential skip scans tricks we could teach indexes to do

(pending other potential answers, I'll just accept this)