Postgresql – Handling misspellings when searching across tsvector columns

postgresql

I have a projects table with a tsvector column. The column is populated and refreshed via a trigger function when some of the other columns on a given row change:

CREATE OR REPLACE FUNCTION refresh_project_search_vector() RETURNS trigger AS $$
    BEGIN
        NEW.project_search_vector :=
            setweight(to_tsvector('english', NEW.id::text), 'A') ||
            setweight(to_tsvector('english', coalesce(NEW.name,'')), 'A') ||
            setweight(to_tsvector('english', coalesce(NEW.description,'')), 'B');
        RETURN NEW;
    END;
$$ LANGUAGE plpgsql

I perform searches using:

...
WHERE project_search_vector @@ plainto_tsquery(search_term)
...

This seems to work well.

However, I also want to account for the user misspelling something. For example, if they search for Californa, I want Postgres to account for that and include/match tsvectors that have "California". (I know about stemming, but it doesn't seem sufficient for many types of misspellings like this.)

So my question is, is there a way to use a similarity logic when using tsvectors for full-text search?

Apologies if I'm asking the wrong question – still new to full-text searching.

Best Answer

I don't think there is any built-in way to do this. I think it would usually be handled by spell checking the search phrase before it ever gets submitted to the database (or by spell checking each word in it after it had been submitted and found no results).

You can use pg_trgm to do fuzzy searching which will be likely to find misspellings, but it doesn't integrate with tsvectors the way you apparently want. One way to use that would be to create two indexes, one FTS based on stemming, and the other for fuzzy searching based on trigrams. Then you could do the query one way (the @@ operator over tsvector column), and if it got no results then redo it the less-preferred way (likely using the %>> operator over raw text column/expression).

If you do find something that integrates these the way you want automatically, please post an answer to your own question.