Postgresql – Slow Index scan on PostgreSQL

performancepostgresql

I have a table named foo_partition_20210501, which has a size of ~10GB and contains ~100M rows. As you may guess it is a partition of a partitioned table, but I don't think has something to do with the problem I'm encountering, as I'm only working on one partition at a time.

The table has multiple columns, but three are of interest:

  • item_id: integer
  • item_date: date
  • code: custom type enum_code

Many of the records of the table have a code = 'A', and I often filter on the tuple (item_id, item_date). So I've created the following partial index:

CREATE INDEX ix_foo_partition_20210501_partial_index 
ON public.foo_partition_20210501 
USING btree (item_date, item_id) 
WHERE (code <> 'A'::enum_code)

The partial index is ~50MB large.

When I query the table such as

SELECT * 
FROM foo_partition_20210501 
WHERE item_date='20210521' 
AND item_id=137415 
AND code IN ('B', 'C')

The resulting plan (explain analyze+buffers) contains this:

->  Index Scan using ix_foo_partition_20210501_partial_index on foo_partition_20210501  (cost=0.43..18.00 rows=8 width=62) (actual time=460.775..20533.036 rows=63 loops=1)
                                                                                            
Index Cond: ((item_date = '2021-05-21'::date) AND (item_id = 137415))
"Filter: (code = ANY ('{B,C}'::enum_code[]))"
Rows Removed by Filter: 1841
Buffers: shared hit=2 read=1124 dirtied=76
I/O Timings: read=20418.129

So the query plan is correct and does use my partial index, but is very slow (20s). It does read the disk instead of the cache, but I still don't understand why it would be so slow. The index is fairly small (only 50MB), and the number of overall rows fetched is less than 2000 with only 63 being kept.

Do you have any insight on why this is slow ? is it normal disk read performance for PostgreSQL ?

Details of the platform:

  • PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
  • SAN Disk

Running the same query multiple times:

When hitting the cache, it's very fast.

Index Scan using ix_foo_partition_20210501_partial_index on foo_partition_20210501  (cost=0.43..18.00 rows=8 width=62) (actual time=0.080..4.672 rows=63 loops=1)
Index Cond: ((foo_date = '2021-05-21'::date) AND (foo_id = 137415))
"Filter: (code = ANY ('{B,C}'::enum_code[]))"
Rows Removed by Filter: 1843
Buffers: shared hit=1128

Some times hitting the disk is still slow but only 8s here:

Index Scan using ix_foo_partition_20210501_partial_index on foo_partition_20210501  (cost=0.43..18.00 rows=8 width=62) (actual time=30.511..8135.828 rows=63 loops=1)
Index Cond: ((foo_date = '2021-05-21'::date) AND (foo_id = 137415))
"Filter: (code = ANY ('{B,C}'::enum_code[]))"
Rows Removed by Filter: 1843
Buffers: shared hit=4 read=1124
I/O Timings: read=8084.544

Best Answer

Reading 1124 blocks to get 1904 rows is unfortunate, but normal.

What is not normal is the I/O time: almost 20 ms for a single 8kB block. Either your I/O system is really overloaded, or the disk is amazingly slow.