Postgresql – Postgres function index doesn’t work correctly with regular expressions

functionsindexpostgresqlregular expression

I have this Postgres function. It forces text to be lowercase and all nulls to become empty strings, so that I more cleanly perform searches for things that don't match something, etc:

CREATE OR REPLACE FUNCTION magic_text(txt text) RETURNS text
IMMUTABLE PARALLEL SAFE
LANGUAGE SQL AS $$
  SELECT lower(coalesce(txt,''))
$$;

This function is heavily used in many different query types, so I created all of the varieties of text index for it:

CREATE INDEX index_magic_cards_on_oracle_text_magic 
ON magic_cards 
USING BTREE (magic_text(oracle_text)) WITH (fillfactor = 100);

CREATE INDEX index_magic_cards_on_oracle_text_magic_text_pattern 
ON magic_cards 
USING BTREE (magic_text(oracle_text) text_pattern_ops) WITH (fillfactor = 100);

CREATE INDEX index_magic_cards_on_oracle_text_magic_gist_trgm 
ON magic_cards 
USING GIST (magic_text(oracle_text) gist_trgm_ops);

CREATE INDEX index_magic_cards_on_oracle_text_magic_gin_trgm 
ON magic_cards 
USING GIN (magic_text(oracle_text) gin_trgm_ops)

These indexes are interfering somehow with some (but not all) kinds of complex regular expression searches. I haven't been able to determine if specific regex symbols or features cause the problem.

Here's an example (explain.depesz):

SELECT card_name 
FROM magic_cards 
WHERE magic_text(oracle_text) ~ '***:(?n)eldrazi\ (?!scion)';

This returns nothing, and according to the query planner, it performs a bitmap index scan on index_magic_cards_on_oracle_text_magic_gin_trgm.

This expanded query also scans the same index, and fails to find anything (explain.depesz):

SELECT card_name 
FROM magic_cards 
WHERE lower(coalesce(oracle_text, '')) ~ '***:(?n)eldrazi\ (?!scion)';

However, if I force Postgres to not use my index, this query has results! (explain.depesz) The following has multiple rows of results and performs a sequential scan on the table. All I did below was change the coalesce fallback to , which shouldn't affect the results:

SELECT card_name 
FROM magic_cards 
WHERE lower(coalesce(oracle_text, '⌘')) ~ '***:(?n)eldrazi\ (?!scion)';

Why does using the index change the results for regular expression searches?

(Postgres 9.6.1, pg_trgm 1.3)

Best Answer

This looks like a live bug in pg_trgm to me. I can strip away much of the stuff and still reproduce it with this simple test case:

create table foobar (x text);
insert into foobar values ('eldrazi scion'),('eldrazi scio');
create extension pg_trgm ;
create index  on foobar using gin (x gin_trgm_ops);
select * from foobar where x  ~ 'eldrazi (?!scion)';  -- returns 1 row
set enable_seqscan TO off;
select * from foobar where x  ~ 'eldrazi (?!scion)'; -- returns 0 rows

Note that a similar bug was fixed recently (in yet-to-be-released code, commit 16500d2278ab3dd), but that fix does not fix this bug.

I've reported this bug myself, as bug #14623.

And it has now been fixed. When 9.6.3 is released, it should contain the bug-fix.