I use Postgresql 9.4 and I have a big table named foo. I want to search on it but I get long execution times if the search text is very short (e.g. "v") or long (e.g. "This is a search example with gin on table foo%"). In this cases my index is ignored. Here my search query:
EXPLAIN (ANALYZE, TIMING)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration->'bar' @> '{"is":["a"]}'
LIMIT 100;
This is my index:
CREATE INDEX index_foo_on_name_de_gin ON foo USING gin(f_unaccent(name) gin_trgm_ops) WHERE locale = 'de';
Why is the index ignored and uses seq scan and/or Bitmap heap scan? How can I add other index to solve this problem?
Why it does a recheck?
Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))
Function f_unaccent
:
CREATE OR REPLACE FUNCTION f_unaccent(text)
RETURNS text AS
$func$
SELECT unaccent('unaccent', $1)
$func$ LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;
Query Plan:
Limit (cost=24412.85..67568.91 rows=100 width=301) (actual time=21838.473..21838.473 rows=0 loops=1)
Buffers: shared hit=1 read=749976
-> Bitmap Heap Scan on foo (cost=24412.85..4595502.73 rows=10592 width=301) (actual time=21838.470..21838.470 rows=0 loops=1)
Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))
Rows Removed by Index Recheck: 5416739
Filter: ((configuration -> 'bar'::text) @> '{"is": ["a"]}'::jsonb)
Rows Removed by Filter: 2196
Heap Blocks: exact=749172
Buffers: shared hit=1 read=749976
-> Bitmap Index Scan on index_foo_on_name_de_gin (cost=0.00..24410.20 rows=10591544 width=0) (actual time=641.532..641.532 rows=5418935 loops=1)
Index Cond: (f_unaccent((name)::text) ~~* 'v%'::text)
Buffers: shared hit=1 read=804
Planning time: 0.767 ms
Execution time: 21838.549 ms
Table definition:
Column | Type | Modifiers | Storage | Stats target | Description
---------------+-----------------------------+--------------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('foo_id_seq'::regclass) | plain | |
locale | character varying | not null | extended | |
name | character varying | not null | extended | |
configuration | jsonb | not null default '{}'::jsonb | extended | |
"index_foo_on_configuration" gin (configuration)
"index_foo_on_name_de_gin" gin (f_unaccent(name::text) gin_trgm_ops) WHERE locale::text = 'de'::text
Without foo.configuration
filter the query is very fast (1.021 ms). But I need this filter. Here the query without the filter:
EXPLAIN (ANALYZE, BUFFERS)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
LIMIT 100;
RESULTS WITH CHANGES
- Update
f_unnacent
function - Added btree index
CREATE INDEX index_foo_on_name_de ON foo (f_unaccent(name) text_pattern_ops) WHERE locale = 'de';
- Added gin index on configuration
CREATE INDEX index_foo_on_configuration ON foo USING gin(configuration jsonb_path_ops);
- Removed old index
A) Query:
EXPLAIN (ANALYZE, BUFFERS)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration->'bar' @> '{"0":["s"]}'
LIMIT 100;
A) Query Plan:
Limit (cost=0.00..121248.83 rows=100 width=301) (actual time=16319.267..16319.267 rows=0 loops=1)
Buffers: shared hit=262079 read=1449294
-> Seq Scan on foo (cost=0.00..12842675.96 rows=10592 width=301) (actual time=16319.261..16319.261 rows=0 loops=1)
Filter: (((locale)::text = 'de'::text) AND ((configuration -> 'bar'::text) @> '{"is": ["a"]}'::jsonb) AND (f_unaccent((name)::text) ~~* 'v%'::text))
Rows Removed by Filter: 41227048
Buffers: shared hit=262079 read=1449294
Planning time: 0.765 ms
Execution time: 16319.313 ms and more!!!
B) Query without configuration:
EXPLAIN (ANALYZE, BUFFERS)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%') LIMIT 100;
B) Query Plan:
Limit (cost=0.00..119.31 rows=100 width=301) (actual time=0.227..2.912 rows=100 loops=1)
Buffers: shared read=31
-> Seq Scan on foo (cost=0.00..12636540.72 rows=10591544 width=301) (actual time=0.221..2.864 rows=100 loops=1)
Filter: (((locale)::text = 'de'::text) AND (f_unaccent((name)::text) ~~* 'v%'::text))
Rows Removed by Filter: 691
Buffers: shared read=31
Planning time: 0.501 ms
Execution time: 2.985 ms
C) Query without configuration and limit:
EXPLAIN (ANALYZE, BUFFERS)
SELECT "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%');
C) Query Plan:
Bitmap Heap Scan on foo (cost=346203.46..4864616.26 rows=10591544 width=301) (actual time=23526.443..30050.008 rows=2196 loops=1)
Recheck Cond: ((locale)::text = 'de'::text)
Rows Removed by Index Recheck: 14094842
Filter: (f_unaccent((name)::text) ~~* 'v%'::text)
Rows Removed by Filter: 10781095
Heap Blocks: exact=572873 lossy=847868
Buffers: shared read=1494015
-> Bitmap Index Scan on index_foo_on_name_de (cost=0.00..343555.58 rows=10592603 width=0) (actual time=1788.454..1788.454 rows=10783291 loops=1)
Buffers: shared read=73274
Planning time: 0.528 ms
Execution time: 30050.168 ms
Best Answer
1.
f_unaccent()
Seems like you are using my function as defined here:
Note the update I just made. This is better:
Detailed explanation over there.
2. Recheck
The "Recheck Cond:" line is always in the
EXPLAIN
output for bitmap index scans. Not to worry. Detailed explanation:3. Index and query plan
That's a misunderstanding. Your index is obviously not ignored. If Postgres expects to find enough rows so that some data pages in the main relation would have to be visited more than once (obviously the case with
rows=10591544
), it switches from index scan to bitmap index scan - which is followed by a "Bitmap Heap Scan" to fetch actual tuples. Details:What makes this query really expensive is a combination of multiple unfortunate factors:
Neither index (Buffers: shared hit=1 read=804) nor table (
Buffers: shared hit=1 read=749976
) were cached. If you repeat that query right away, it will be much faster, since all of it is cached by then. This is the worst case possibleThe search pattern
f_unaccent('v%')
- or just'v%'
is a very bad case for a trigram index. Not very selective - but still selective enough to use it instead of an actual sequential scan. Atext_pattern_ops
index would be much faster for this. See below.More selective patterns (longer string) would also be much faster.
You had
LIMIT 100
, so Postgres started out optimistically hoping to find 100 rows quickly. But the query returns with 0 rows (rows=0
). This means that Postgres had to walk through all candidate rows unsuccessfully. Another worst case scenario. Your second predicate is to blame here:Postgres has only very limited statistics for
jsonb
columns. It has no idea how selective that condition is going to be. If you have many queries onconfiguration->'bar'
, you could improve the situation drastically with another expression index ...Possibly even a multicolumn index.
4.
text_pattern_ops
For just left-anchored patterns ("right end wildcard"), you can make do without trigram indexes. But a plain btree index won't do, if you are using any locale in your DB other than the "C" locale (which is effectively "no locale"). Else you need special operator classes to ignore the locale. Like:
Details: