In this case, it's doing an Index Scan. As best as I know, the difference between an Index scan and a Bitmap Index/Heap Scan is that the former will read pages in the order defined by the index, while the latter will create a bitmap of pages to read (possibly from multiple indexes), order the results, and read them in [heap] order.
Correct.
There are also index-only scans, where the index is read to satisfy the query directly, and there's no heap fetch for most of the index pages read. Heap fetches can still be required if the system can't be sure the pages are visible to all transactions, so it's like an index scan with a shortcut.
It seems to me that in both cases, one would need to read the pages of the index to actually determine the resulting data pages to read.
Correct.
However, in the "Index Scan" the line for "Buffers" has 160 pages, which based on my other tests is the number of pages of actual data, and doesn't include the 32 pages I saw above in the Bitmap Index Scan for the index itself.
I think this is the misunderstanding. That line will be the sum of pages read from the index and from the heap. Not all index pages, and not all heap pages, will need to be read.
Another important note is that the "buffers" metrics, like other metrics in an EXPLAIN, are cumulative: the number of pages read in the parent node includes that of the child nodes. Thus the 160 buffers hit include the index reads. In actuality, only 128 pages were read from the heap in both of the final cases.
How does PostgreSQL knows by just a bitmap anything about rows' physical order?
The bitmap is one bit per heap page. The bitmap index scan sets the bits based on the heap page address that the index entry points to.
So when it goes to do the bitmap heap scan, it just does a linear table scan, reading the bitmap to see whether it should bother with a particular page or seek over it.
Or generates the bitmap so that any element of it can be mapped to the pointer to a page easily?
No, the bitmap corresponds 1:1 to heap pages.
I wrote some more on this here.
OK, it looks like you might be misunderstanding what "bitmap" means in this context.
It's not a bit string like "101011" that's created for each heap page, or each index read, or whatever.
The whole bitmap is a single bit array, with as many bits as there are heap pages in the relation being scanned.
One bitmap is created by the first index scan, starting off with all entries 0 (false). Whenever an index entry that matches the search condition is found, the heap address pointed to by that index entry is looked up as an offset into the bitmap, and that bit is set to 1 (true). So rather than looking up the heap page directly, the bitmap index scan looks up the corresponding bit position in the bitmap.
The second and further bitmap index scans do the same thing with the other indexes and the search conditions on them.
Then each bitmap is ANDed together. The resulting bitmap has one bit for each heap page, where the bits are true only if they were true in all the individual bitmap index scans, i.e. the search condition matched for every index scan. These are the only heap pages we need to bother to load and examine. Since each heap page might contain multiple rows, we then have to examine each row to see if it matches all the conditions - that's what the "recheck cond" part is about.
One crucial thing to understand with all this is that the tuple address in an index entry points to the row's ctid
, which is a combination of the heap page number and the offset within the heap page. A bitmap index scan ignores the offsets, since it'll check the whole page anyway, and sets the bit if any row on that page matches the condition.
Graphical example
Heap, one square = one page:
+---------------------------------------------+
|c____u_____X___u___X_________u___cXcc______u_|
+---------------------------------------------+
Rows marked c match customers pkey condition.
Rows marked u match username condition.
Rows marked X match both conditions.
Bitmap scan from customers_pkey:
+---------------------------------------------+
|100000000001000000010000000000000111100000000| bitmap 1
+---------------------------------------------+
One bit per heap page, in the same order as the heap
Bits 1 when condition matches, 0 if not
Bitmap scan from ix_cust_username:
+---------------------------------------------+
|000001000001000100010000000001000010000000010| bitmap 2
+---------------------------------------------+
Once the bitmaps are created a bitwise AND is performed on them:
+---------------------------------------------+
|100000000001000000010000000000000111100000000| bitmap 1
|000001000001000100010000000001000010000000010| bitmap 2
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
|000000000001000000010000000000000010000000000| Combined bitmap
+-----------+-------+--------------+----------+
| | |
v v v
Used to scan the heap only for matching pages:
+---------------------------------------------+
|___________X_______X______________X__________|
+---------------------------------------------+
The bitmap heap scan then seeks to the start of each page and reads the page:
+---------------------------------------------+
|___________X_______X______________X__________|
+---------------------------------------------+
seek------->^seek-->^seek--------->^
| | |
------------------------
only these pages read
and each read page is then re-checked against the condition since
there can be >1 row per page and not all necessarily match
the condition.
Best Answer
Index Only: Will not visit the heap at all for a block if the block is marked as all-visible in the visibility map. If it does need to visit a block, it might do so on multiple occasions, and those occasions might be separated from each other by a lot of time because multiple distant index entries can point to the same block. During the intervening time, the block might have been pushed out of the cache and so need to be read back in.
Bitmap: Cannot use the visibility map to avoid visiting a block. All visits to the same block will occur back-to-back, so it is unlikely to get shoved out of cache between visits. It also visits in physical order even when the index correlation is low, and so gets credited with having its table heap IO be more sequential and less random. But that is only important when you read a substantial part of the table, meaning fetching a lot of rows.
Higher
pg_class.relallvisible
means it is more likely than an IOS will not need to visit the table heap at all for a block, and so favors IOS. Keeping relallvisible high requires vacuuming.Higher
pg_stats.correlation
for the table column that is the leading column of the index means that multiple visits to a block are likely to occur from the same region of the index, and so the block is unlikely to be shoved out of the cache between visits when using an IOS, and so favors an IOS.Higher values of
effective_cache_size
means blocks are estimated to stay in the cache for longer meaning repeated distant visits are likely to find them in the cache still, which again favors IOS.These have a complex relationship. If all blocks are marked allvisible, then the correlation and effective_cache_size do not matter, as there will not be any heap visits at all.