I have a table residences
…
I installed trgm: create extension pg_trgm;
After adding trigram indexes to both grp
and name
columns using:
CREATE INDEX residences_name_trgm ON residences USING GIN (name gin_trgm_ops);
CREATE INDEX residences_grp_trgm ON residences USING GIN (grp gin_trgm_ops);
And checking performance of a simple query… it appears that the total runtime for a query on the name
column runs ~100x faster. Why is this?
EXPLAIN ANALYZE SELECT * FROM residences WHERE name ILIKE '%Sutton%';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on residences (cost=36.02..47.90 rows=3 width=1872) (actual time=0.390..0.720 rows=21 loops=1)
Recheck Cond: ((name)::text ~~ '%Sutton%'::text)
-> Bitmap Index Scan on residences_name_trgm_gin (cost=0.00..36.02 rows=3 width=0) (actual time=0.354..0.354 rows=21 loops=1)
Index Cond: ((name)::text ~~ '%Sutton%'::text)
Total runtime: 0.814 ms
(5 rows)
and
EXPLAIN ANALYZE SELECT * FROM residences WHERE grp ILIKE '%Sutton%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on residences (cost=97.93..8879.41 rows=5927 width=1872) (actual time=5.516..115.634 rows=5968 loops=1)
Recheck Cond: ((grp)::text ~~ '%Sutton%'::text)
-> Bitmap Index Scan on residences_grp_trgm_gin (cost=0.00..96.45 rows=5927 width=0) (actual time=4.366..4.366 rows=5968 loops=1)
Index Cond: ((grp)::text ~~ '%Sutton%'::text)
Total runtime: 119.779 ms
(5 rows)
For reference, both name
and grp
columns are CHARACTER VARYING(50)
.
Futhermore… counting distinct values in each column yields:
SELECT COUNT(DISTINCT grp) FROM residences;
421
SELECT COUNT(DISTINCT name) FROM residences
7750
That is, the name
column actually has ~20x more distinct values, but returns values much faster.
More information on this technique can be found on depesz's article WAITING FOR 9.1 – FASTER LIKE/ILIKE and another similar tutorial on the PalominoDB blog.
Best Answer
Exactly so. An index on
name
is more selective than an index ongrp
, so index use is more effective.This is confirmed by the fact that your first query, on the
name
field, the bitmap index scan onresidences_grp_trgm_gin
gets only 21 results as shown byactual( ... rows=21 ...)
in theexplain analyze
output. Ongrp
you get 5968 results.In a bitmap index scan every heap read and re-check has a significant cost, so it makes perfect sense that finding 21 rows is going to be a lot faster than finding 5968.
BTW, please always show
select version()
and your server configuration, as well as anything else that's relevant from the wiki's guide to reporting problems.