Postgresql – Optimize very simple Postgresql query (and understand Query Plan)

performancepostgresqlquery-performance

I have a DB with many tables created using the following structure :

CREATE TABLE table1 (id serial primary key, col1 real, col2 real, ..., coln real)

with about 50 columns and 400k rows. The tables are read-only but are fully dropped and recreated once a day (with new values) and are hosted on a HDD.

When I run a simple query based on the PK:

EXPLAIN ANALYZE SELECT col1, col12, col22, col17, col41, col27 FROM table1 WHERE table1.id=461 OR table1.id=7618 OR table1.id=45581 OR table1.id=5238;

I get the following QUERY PLAN :

Bitmap Heap Scan on table1  (cost=9.72..17.76 rows=4 width=24) (actual time=28.290..45.165 rows=4 loops=1)
   Recheck Cond: ((id = 461) OR (id = 7618) OR (id = 45581) OR (id = 5238))
   Heap Blocks: exact=4
   ->  BitmapOr  (cost=9.72..9.72 rows=4 width=0) (actual time=16.748..16.748 rows=0 loops=1)
         ->  Bitmap Index Scan on table1_pkey  (cost=0.00..2.43 rows=1 width=0) (actual time=11.230..11.230 rows=1 loops=1)
               Index Cond: (id = 461)
         ->  Bitmap Index Scan on table1_pkey  (cost=0.00..2.43 rows=1 width=0) (actual time=0.832..0.832 rows=1 loops=1)
               Index Cond: (id = 7618)
         ->  Bitmap Index Scan on table1_pkey  (cost=0.00..2.43 rows=1 width=0) (actual time=4.476..4.476 rows=1 loops=1)
               Index Cond: (id = 45581)
         ->  Bitmap Index Scan on table1_pkey  (cost=0.00..2.43 rows=1 width=0) (actual time=0.208..0.208 rows=1 loops=1)
               Index Cond: (id = 5238)
 Planning time: 0.074 ms
 Execution time: 45.197 ms
(14 rows)

If a run the exact same query again, the QUERY PLAN becomes:

Bitmap Heap Scan on table1  (cost=9.72..17.76 rows=4 width=24) (actual time=0.031..0.036 rows=4 loops=1)
   Recheck Cond: ((id = 461) OR (id = 7618) OR (id = 45581) OR (id = 5238))
   Heap Blocks: exact=4
   ->  BitmapOr  (cost=9.72..9.72 rows=4 width=0) (actual time=0.024..0.024 rows=0 loops=1)
         ->  Bitmap Index Scan on table1_pkey  (cost=0.00..2.43 rows=1 width=0) (actual time=0.012..0.012 rows=1 loops=1)
               Index Cond: (id = 461)
         ->  Bitmap Index Scan on table1_pkey  (cost=0.00..2.43 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)
               Index Cond: (id = 7618)
         ->  Bitmap Index Scan on table1_pkey  (cost=0.00..2.43 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=1)
               Index Cond: (id = 45581)
         ->  Bitmap Index Scan on table1_pkey  (cost=0.00..2.43 rows=1 width=0) (actual time=0.003..0.003 rows=1 loops=1)
               Index Cond: (id = 5238)
 Planning time: 0.078 ms
 Execution time: 0.074 ms
(14 rows)

The first question is why in the first query the actual time of the Beatmap Heap Scan is higher than the expected cost ?

Also, is there a configuration that would improve the query performance ?

Many thanks in advance.

PS :due to RAM limitation I can not use pg-warm tool

Best Answer

You need to understand that the costs provided are not absolute values. It basically has no other meaning than helping the planner comparing plans. So you can't compare cost (that has no unit) to actual time (that's given in ms). You can compare costs between two plans (provided by the same postgres server, because it will depend on configuration parameters and sometimes Postgres version).

Then (and I think you figured it out by yourself), the performance difference between first time and second time is that the second time data are cached, so getting them is way faster...

Your query runs in 45ms when data are not cached... That's pretty good and it wouldn't need farther optimization normally. Remember the first rule for optimization is

Don't do it

But I think you simplified your problem here so we could understand it. If your query is way more slower that what you showed and does use OR, remember it's quite expensive (in time and resource). You can always rewrite your query like that :

EXPLAIN ANALYZE SELECT col1, col12, col22, col17, col41, col27 FROM table1 WHERE table1.id IN(461,7618,45581,5238);

or like that :

EXPLAIN ANALYZE
(SELECT col1, col12, col22, col17, col41, col27 FROM table1 WHERE table1.id=461
UNION ALL
SELECT col1, col12, col22, col17, col41, col27 FROM table1 WHERE table1.id=7618
UNION ALL
SELECT col1, col12, col22, col17, col41, col27 FROM table1 WHERE table1.id=45581
UNION ALL
SELECT col1, col12, col22, col17, col41, col27 FROM table1 table1.id=5238);

Or (like a_horse_with_no_name suggested) with a join on a values clause :

EXPLAIN ANALYZE SELECT col1, col12, col22, col17, col41, col27 FROM table1 INNER JOIN (VALUES (461),(7618),(45581),(5238)) AS t(id) ON table1.id = t.id;

You might find usefull the using explain official documentation page.