I have a compound LIKE
in a query that would have seemed to have benefited from a gin
index however on two practically identical servers I see vastly different performance, with the worst case being almost as slow as a Seq Scan
:
Bad:
EXPLAIN (ANALYZE, BUFFERS) SELECT nodes.node_path FROM nodes WHERE nodes.node_path LIKE 'company%/path%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on nodes (cost=2114.03..17149.31 rows=3875 width=42) (actual time=11000.379..11036.533 rows=52 loops=1)
Recheck Cond: ((node_path)::text ~~ 'company%/path%'::text)
Rows Removed by Index Recheck: 46597
Heap Blocks: exact=27913
Buffers: shared hit=73481
-> Bitmap Index Scan on nodes_node_path_gin_idx (cost=0.00..2113.06 rows=3875 width=0) (actual time=10992.576..10992.576 rows=46649 loops=1)
Index Cond: ((node_path)::text ~~ 'company%/path%'::text)
Buffers: shared hit=45568
Planning time: 0.245 ms
Execution time: 11036.769 ms
(10 rows)
Good:
EXPLAIN (ANALYZE, BUFFERS) SELECT nodes.node_path FROM nodes WHERE nodes.node_path LIKE 'company%/path%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on nodes (cost=2528.05..20571.91 rows=4651 width=43) (actual time=1329.024..1354.603 rows=56 loops=1)
Recheck Cond: ((node_path)::text ~~ 'company%/path%'::text)
Rows Removed by Index Recheck: 48704
Heap Blocks: exact=16106
Buffers: shared hit=126080
-> Bitmap Index Scan on nodes_node_path_gin_idx (cost=0.00..2526.89 rows=4651 width=0) (actual time=1324.801..1324.802 rows=48760 loops=1)
Index Cond: ((node_path)::text ~~ 'company%/path%'::text)
Buffers: shared hit=109974
Planning time: 0.237 ms
Execution time: 1354.748 ms
(10 rows)
For index:
CREATE INDEX nodes_node_path_gin_idx ON public.nodes USING gin (node_path gin_trgm_ops)
Where node_path
stores slash delimited file paths.
These hosts otherwise exhibit very similar performance, and there's seemingly no disk I/O or other factors obviously different between these executions.
I enabled track_io_timing
, I/O Timings
is not emitted, and BUFFERS
indicates Buffers: shared hit=43783
. I dumped this column to a local 9.6.16 instance and it runs in milliseconds, but on the RDS Postgres 9.6.11 instance it's always 10 seconds:
EXPLAIN (ANALYZE, BUFFERS) SELECT nodes.node_path FROM nodes WHERE nodes.node_path LIKE 'company%/path%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on nodes (cost=403.14..15246.51 rows=4018 width=42) (actual time=76.402..76.421 rows=52 loops=1)
Recheck Cond: ((node_path)::text ~~ 'company%/path%'::text)
Rows Removed by Index Recheck: 48576
Heap Blocks: exact=506
Buffers: shared hit=8530
-> Bitmap Index Scan on nodes_node_path_gin_idx (cost=0.00..402.13 rows=4018 width=0) (actual time=71.494..71.494 rows=48628 loops=1)
Index Cond: ((node_path)::text ~~ 'company%/path%'::text)
Buffers: shared hit=8024
Planning time: 0.188 ms
Execution time: 76.452 ms
(10 rows)
How do I get information on where the cost is for the Bitmap Index Scan
itself – I assume it's something specific to the implementation of the ~~
operator for the gin
index type, but is there additional instrumentation I can enable to understand the unexplained time?
Best Answer
Substantial improvements were made to pg_trgm in PostgreSQL v9.6, pg_trgm v1.3.
But if you upgraded your server from before 9.6 to 9.6 or later using "pg_upgrade", you will not benefit from these improvements until you do
alter extension pg_trgm update TO "1.3"
.