PostgreSQL 9.3 – Pagination and Counting Number of Pages

indexpostgresqlpostgresql-9.3

I'm implementing pagination and sorting of rows in, let's say, products table, using multicolumn index on category, score and id.

-- index
create index category_score_id on products(category, score, id)
  where active = true;

-- selecting first page
select * from products where
  category = 1
  and active = true
order by score desc, id desc
limit 30;

-- selecting following pages
select * from products where
  category = 1
  and active = true
  and (score, id) < (893, 102458)  -- values from previous page
order by score desc, id desc
limit 30;

This works perfectly fine and as expected, using only index scan:

-- explain analyze of 2-nd query:
Limit  (cost=0.00..99.52 rows=30 width=2591) (actual time=0.090..0.179 rows=30 loops=1)
  ->  Index Scan Backward using category_score_id on products  (cost=0.00..76435.07 rows=23041 width=2591) (actual time=0.089..0.172 rows=30 loops=1)
        Index Cond: ((category = 1) AND (ROW(score, id) < ROW(893, 102458)))
Total runtime: 0.085 ms

What I am curious about is why the following query requires extra Bitmap Heap Scan:

select count(*) from products where
  category = 1
  and active = true;
-- explain analyze:
Aggregate  (cost=49987.80..49987.81 rows=1 width=0) (actual time=104.847..104.847 rows=1 loops=1)
  ->  Bitmap Heap Scan on products  (cost=538.85..49930.20 rows=23041 width=0) (actual time=8.919..98.952 rows=47937     loops=1)
        Recheck Cond: ((category = 1) AND active)
        Rows Removed by Index Recheck: 93006
        ->  Bitmap Index Scan on category_score_id  (cost=0.00..533.09 rows=23041 width=0) (actual time=7.181..7.181 rows=47937     loops=1)
              Index Cond: (category = 1)
Total runtime: 104.892 ms

Best Answer

It is not doing an extra bitmap scan. It is doing a bitmap scan instead of the regular index scan.

Why is it doing the bitmap scan instead?

Because it thinks it will be faster that way. Without the LIMIT and the ORDER BY, it anticipates that using the bitmap scan will let it do the IO on the table heap in a more efficient manner. You can see if PostgreSQL is correct in this assessment by temporarily doing a set enable_bitmapscan TO off; (although caching effects may make that experiment difficult to interpret correctly.)

Also, the line:

Rows Removed by Index Recheck: 93006

Suggests that your work_mem setting might be too small to accommodate the bitmap efficiently.

Why does it show up in the explain plan as two entries rather than one?

The ordinary index scan is really two scans in one, a scan over the index, and then a nested look-up over the table. But these two parts are integrated, and only show up as one entry in the execution plan.

The bitmap case also has these two steps, but they are not integrated. It is possible to inject extra steps between them, such as a BitmapOr or a BitmapAnd. It reports them as two separate entries to accommodate this possibility.