PostgreSQL Index Only Scan – When to Use Over Bitmap Index + Heap Scan

bitmap-indexpostgresql

Today I've been doing some reading on the difference between the two and I think I've got a reasonable idea of each.

Index Only: Accesses heap pages as it moves through the index, possibly access the same page multiple time.

Bitmap Index Scan + Bitmap Heap Scan: Builds a bitmap of heap pages that need to be visited containing a matching tuple. The map(s) are then combined and sorted and heap pages accessed.

What I didn't get a definitive answer on, is the conditions that will determine when one will be used over the other. As far as I can gather, these are some of them:

  1. The retrieved columns are a subset of the indexed columns.
  2. A vacuumed and analyzed table (index only will only work on these)
  3. Number of rows that will be fetched – though this didn't make sense to me.

Best Answer

Index Only: Will not visit the heap at all for a block if the block is marked as all-visible in the visibility map. If it does need to visit a block, it might do so on multiple occasions, and those occasions might be separated from each other by a lot of time because multiple distant index entries can point to the same block. During the intervening time, the block might have been pushed out of the cache and so need to be read back in.

Bitmap: Cannot use the visibility map to avoid visiting a block. All visits to the same block will occur back-to-back, so it is unlikely to get shoved out of cache between visits. It also visits in physical order even when the index correlation is low, and so gets credited with having its table heap IO be more sequential and less random. But that is only important when you read a substantial part of the table, meaning fetching a lot of rows.

Higher pg_class.relallvisible means it is more likely than an IOS will not need to visit the table heap at all for a block, and so favors IOS. Keeping relallvisible high requires vacuuming.

Higher pg_stats.correlation for the table column that is the leading column of the index means that multiple visits to a block are likely to occur from the same region of the index, and so the block is unlikely to be shoved out of the cache between visits when using an IOS, and so favors an IOS.

Higher values of effective_cache_size means blocks are estimated to stay in the cache for longer meaning repeated distant visits are likely to find them in the cache still, which again favors IOS.

These have a complex relationship. If all blocks are marked allvisible, then the correlation and effective_cache_size do not matter, as there will not be any heap visits at all.