PostgreSQL – Using tsvectors with gin_trgm_ops for Full-Text Search

full-text-searchindexpostgresql

I have a table like so.

ALTER TABLE clients ADD tsv tsvector;
CREATE INDEX index_clients_on_tsv
  ON clients
  USING gin (tsv);
CREATE TRIGGER clientstsvectorupdate BEFORE INSERT OR UPDATE
  ON clients FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(
    tsv, 'pg_catalog.english', first_name, middle_name, last_name, email, birthday_string
  );

But then I noticed that fuzzy searching was not working: "Chris", would not look up "Christopher". I did a little digging and found gin_trgm_ops Naturally the first thing I tried to do was update the index via

CREATE INDEX index_clients_on_tsv
  ON clients
  USING gin (tsv gin_trgm_ops);

But now I see an error relating to the fact I'm passing a tsvector to gin_trgm_ops.

(Postgrex.Error) ERROR 42804 (datatype_mismatch): operator class "gin_trgm_ops" does not accept data type tsvector

What is the best way to address what I'm trying to implement?

Best Answer

The gin_trgm_ops is an operator class over text. Your tsv column is a tsvector, and not text. So simply that won't work. A tsvector stores lexemes not words. That means the data you're searching for may not even be there. Observe,

SELECT to_tsvector('Christopher');
  to_tsvector  
---------------
 'christoph':1
(1 row)

That means after you convert Christopher to a tsvector, you lose some data in the algorithmic affix stemmer. That's fine for FTS, it's part of the compromise and how you achieve the amazing speed. It may not be desirable if that's what you're looking for though, so you have two options.

  • Use lexeme prefix matching with :* (assuming the lexeme has the needed information, if it doesn't you may be able to hack either the text search configuration to not-stem, or the stemmer to stem to your workload)

    ## returns false
    SELECT to_tsvector('Christopher') @@ to_tsquery('Chris');
    ## returns true
    SELECT to_tsvector('Christopher') @@ to_tsquery('Chris:*');
    
  • Index the actual column you're using to generate the TSV or wish to search in, using the trigram operator classes (like gin_trgm_ops). If you're always doing left-anchored search, you may consider the more efficient text_pattern_ops.