Postgresql – Optimize Bitmap Heap Scan

optimizationperformancepostgresqlpostgresql-performancequery-performance

I am trying to understand why my query takes a really long time even though I have the required columns indexed:

SELECT entity_id,
       id,
       report_date
FROM own_inst_detail
WHERE ( own_inst_detail.id = 'P7M7WC-S' )
  AND ( own_inst_detail.report_date >= '2017-02-01T17:29:49.661Z' )
  AND ( own_inst_detail.report_date <= '2018-08-01T17:29:49.663Z' )

The cached result of EXPLAIN ANALYZE is as follows:

Bitmap Heap Scan on own_inst_detail (cost=20.18..2353.55 rows=597 width=22) (actual time=1.471..6.955 rows=4227 loops=1)
  Recheck Cond: ((id = 'P7M7WC-S'::bpchar) AND (report_date >= '2017-06-01'::date) AND (report_date <= '2018-08-01'::date))
  Heap Blocks: exact=4182
  ->  Bitmap Index Scan on own_inst_detail  (cost=0.00..20.03 rows=597 width=0) (actual time=0.901..0.901 rows=4227 loops=1)
        Index Cond: ((id = 'P7M7WC-S'::bpchar) AND (report_date >= '2017-06-01'::date) AND (report_date <= '2018-08-01'::date))
Planning time: 0.123 ms
Execution time: 7.801 ms

This part of the query takes 4 out of 5 total seconds that my full query takes.

I have combined index for id and report_date. I also have two standalone indexes for those columns.

I've tried playing around with setting high work_mem as well as lowering the random_page_cost but nothing really helps.

Any additional suggestions are greatly appreciated.

I found similar question How to index WHERE (start_date >= '2013-12-15') which recommends adding a B-Tree index, but I already do have index for report_date.

Create table script:

CREATE TABLE IF NOT EXISTS public.own_inst_detail (
    entity_id character(8) NOT NULL,
    id character(8) NOT NULL,
    report_date date NOT NULL,
    PRIMARY KEY(report_date)
);

Index:

CREATE INDEX indx_own_inst_detail_report_date_desc ON own_inst_detail (report_date DESC NULLS LAST)

CREATE INDEX indx_own_inst_detail_id_report_date_desc ON own_inst_detail (id, report_date DESC NULLS LAST)

Best Answer

Data types

The data type character(n) is almost always the wrong choice. It's a "legacy" type I wouldn't use any more. It exhibits surprising behavior and does nothing text / varchar / varchar(n) couldn't do better.

And you are mixing type date in the table with timestamp literals in the query. While this works, you should at least provide explicit type declaration to defend against surprising results. Better yet, provide actual dates in the query or cast input explicitly. Like

report_date >= timestamp '2017-02-01T17:29:49.661Z'

Or:

report_date >= date '2017-02-02'

The manual about constants and type casts.

Index

Your index indx_own_inst_detail_id_report_date_desc on (id, report_date DESC NULLS LAST) looks good for the query, and the query plan you provided makes good use of it. Execution time: 7.801 ms does not seem too bad either.

If your "uncached" query takes 4 seconds, then you may have to work on your hardware or server configuration or both. Slow storage and not enough RAM for cache? More work_mem is not the cure for this, may even make it worse by taking away RAM from cache memory. Related:

If you have enough RAM and proper memory settings this might burn down to an issue of cold cache: Only the first invocation is slow (or the first few invocations). If that's a problem, consider pg_prewarm. See:

If your table is vacuumed enough (or mostly read-only) you may profit from index-only scans if you append the one additional column entity_id in your SELECT list to the index:

CREATE INDEX ON own_inst_detail (id, report_date DESC NULLS LAST, entity_id)

That might be particularly useful in your situation since Postgres only accesses the index in this case and does not need to visit the table at all. (Eliminating the Bitmap Heap Scan completely.) May help with your disk / cold cache bottleneck.

Related: