I have a table like so.
ALTER TABLE clients ADD tsv tsvector;
CREATE INDEX index_clients_on_tsv
ON clients
USING gin (tsv);
CREATE TRIGGER clientstsvectorupdate BEFORE INSERT OR UPDATE
ON clients FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(
tsv, 'pg_catalog.english', first_name, middle_name, last_name, email, birthday_string
);
But then I noticed that fuzzy searching was not working: "Chris", would not look up "Christopher". I did a little digging and found gin_trgm_ops
Naturally the first thing I tried to do was update the index via
CREATE INDEX index_clients_on_tsv
ON clients
USING gin (tsv gin_trgm_ops);
But now I see an error relating to the fact I'm passing a tsvector
to gin_trgm_ops
.
(Postgrex.Error) ERROR 42804 (datatype_mismatch): operator class "gin_trgm_ops" does not accept data type tsvector
What is the best way to address what I'm trying to implement?
Best Answer
The
gin_trgm_ops
is an operator class overtext
. Yourtsv
column is atsvector
, and nottext
. So simply that won't work. Atsvector
stores lexemes not words. That means the data you're searching for may not even be there. Observe,That means after you convert
Christopher
to atsvector
, you lose some data in the algorithmic affix stemmer. That's fine for FTS, it's part of the compromise and how you achieve the amazing speed. It may not be desirable if that's what you're looking for though, so you have two options.Use lexeme prefix matching with
:*
(assuming the lexeme has the needed information, if it doesn't you may be able to hack either the text search configuration to not-stem, or the stemmer to stem to your workload)Index the actual column you're using to generate the TSV or wish to search in, using the trigram operator classes (like
gin_trgm_ops
). If you're always doing left-anchored search, you may consider the more efficienttext_pattern_ops
.