Postgresql – What do the shared buffers metrics for various types of index scans represent

explainindexpostgresqlpostgresql-9.3

TL;DR: When an "EXPLAIN (BUFFERS)" shows an "Index Scan", it does not show the number of pages needed to be read from the index. Is it just omitting that number, or does it actually not read pages (and I misunderstand how the index works)?

We have a large read-only table, that looks like this:

database=> \d my_table
    Table "my_table"
        Column         |       Type       | Modifiers
-----------------------+------------------+-----------
 id                    | integer          |
 date                  | date             |
 country_id            | smallint         |
 ...other columns...
Indexes:
    "my_table_id_date_idx" btree (id, date)

and a typical query against this table has an EXPLAIN like this…

database=> EXPLAIN (BUFFERS, ANALYZE) SELECT id, date, country_id FROM my_table WHERE id = 50 AND date BETWEEN '2015-04-01' AND '2015-04-07';
                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on my_table  (cost=448.40..70400.85 rows=18810 width=10) (actual time=9.011..1447.817 rows=10224 loops=1)
   Recheck Cond: ((id = 50) AND (date >= '2015-04-01'::date) AND (date <= '2015-04-07'::date))
   Buffers: shared hit=232 read=9994
   ->  Bitmap Index Scan on my_table_id_date_idx  (cost=0.00..443.69 rows=18810 width=0) (actual time=6.467..6.467 rows=10224 loops=1)
         Index Cond: ((id = 50) AND (date >= '2015-04-01'::date) AND (date <= '2015-04-07'::date))
         Buffers: shared hit=2 read=30
 Total runtime: 1450.175 ms
(7 rows)

Here is how I'm analyzing this:

  1. The query will perform a "Bitmap Index Scan", which requires it to read 32 pages. Up until now, I've assumed these 32 pages contain the data for the index itself. In the end, it creates a list of pages to read.
  2. The query then performs the "Bitmap Heap Scan", which will order the pages to read, then read them. In the end, this reads over 10k pages.

We've recently changed this (read-only) table to CLUSTER on its index, which has dramatically reduced the pages needed to be read. Now an explain looks like this…

database=> EXPLAIN (BUFFERS, ANALYZE) SELECT id, date, country_id FROM my_table WHERE id = 50 AND date BETWEEN '2015-04-01' AND '2015-04-07';
                                                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using my_table_id_date_idx on my_table  (cost=0.57..36703.31 rows=20594 width=10) (actual time=0.029..4.830 rows=10224 loops=1)
   Index Cond: ((id = 50) AND (date >= '2015-04-01'::date) AND (date <= '2015-04-07'::date))
   Buffers: shared hit=160
 Total runtime: 6.658 ms
(4 rows)

In this case, it's doing an Index Scan. As best as I know, the difference between an Index scan and a Bitmap Index/Heap Scan is that the former will read pages in the order defined by the index, while the latter will create a bitmap of pages to read (possibly from multiple indexes), order the results, and read them in order. It seems to me that in both cases, one would need to read the pages of the index to actually determine the resulting data pages to read.

However, in the "Index Scan" the line for "Buffers" has 160 pages, which based on my other tests is the number of pages of actual data, and doesn't include the 32 pages I saw above in the Bitmap Index Scan for the index itself.

database=> set enable_indexscan to false;
SET
database=> EXPLAIN (BUFFERS, ANALYZE) SELECT id, date, country_id FROM my_table WHERE id = 50 AND date BETWEEN '2015-04-01' AND '2015-04-07';
                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on my_table  (cost=491.14..76781.03 rows=20594 width=10) (actual time=1.003..3.873 rows=10224 loops=1)
   Recheck Cond: ((id = 50) AND (date >= '2015-04-01'::date) AND (date <= '2015-04-07'::date))
   Buffers: shared hit=160
   ->  Bitmap Index Scan on my_table_id_date_idx  (cost=0.00..486.00 rows=20594 width=0) (actual time=0.981..0.981 rows=10224 loops=1)
         Index Cond: ((id = 50) AND (date >= '2015-04-01'::date) AND (date <= '2015-04-07'::date))
         Buffers: shared hit=32
 Total runtime: 5.595 ms
(7 rows)

Does this imply that the index scan does not actually read from the index? How does it know what pages to read, then? Is the number of index pages read just omitted in the Index Scan explain output? Or perhaps my understanding of what that value in the Bitmap Index Scan represents is incorrect.

Best Answer

In this case, it's doing an Index Scan. As best as I know, the difference between an Index scan and a Bitmap Index/Heap Scan is that the former will read pages in the order defined by the index, while the latter will create a bitmap of pages to read (possibly from multiple indexes), order the results, and read them in [heap] order.

Correct.

There are also index-only scans, where the index is read to satisfy the query directly, and there's no heap fetch for most of the index pages read. Heap fetches can still be required if the system can't be sure the pages are visible to all transactions, so it's like an index scan with a shortcut.

It seems to me that in both cases, one would need to read the pages of the index to actually determine the resulting data pages to read.

Correct.

However, in the "Index Scan" the line for "Buffers" has 160 pages, which based on my other tests is the number of pages of actual data, and doesn't include the 32 pages I saw above in the Bitmap Index Scan for the index itself.

I think this is the misunderstanding. That line will be the sum of pages read from the index and from the heap. Not all index pages, and not all heap pages, will need to be read.

Another important note is that the "buffers" metrics, like other metrics in an EXPLAIN, are cumulative: the number of pages read in the parent node includes that of the child nodes. Thus the 160 buffers hit include the index reads. In actuality, only 128 pages were read from the heap in both of the final cases.