PostgreSQL – Index Trigger/Hook for Synonyms

full-text-searchgin-indexpostgresqlsynonymstrigger

Is it possible, in PostgreSQL, to add a trigger to modify data prior to it going into a full text index?

PostgreSQL requires using files to add custom dictionaries for things like stop words, synonyms, etc. However, on most managed database hosting file access is not allowed. What would be best is if I could use a table (such as a spaced single or double column) instead of a file dictionary to supply the list of synonyms, stop words, etc. but after a lot of searching it does not appear that is possible.

Another path I tried to look into was to populate a table with synonyms then export it to the file system, however PostgreSQL requires a full path to the file. On managed hosting you cannot provide a full path if you do not know the the paths to begin with.

So now I am down to seeing if there is some way I can "hook" into the pre-index process and modify the data myself prior to it being added to an index. Note that I thought about using a trigger and modifying the data directly but that would also modify the data going into the stored row's field – something I don't want to do. What I am looking for is a way to leave the original data / field intact and being entered into a field while modifying the information prior to it being indexed. I thought about adding an additional column to the table with the modified data, however that can get very messy and may require a lot of extra columns for larger data sets. Another option is using materialized views or inherited/child tables but, once again, would be a messy and difficult to manage outcome.

TLDR: I am asking if there is any viable way to define stop words, synonyms, etc. without having to use files (so that it can be used on managed systems that do not provide file access)?

Best Answer

It's all preprocessing. PostgreSQL can do it's thing. You can do yours.

CREATE INDEX ON myTable 
  USING gin ( myTransform(x) );

SELECT *
FROM myTable
WHERE 'foo & bar' @@ myTransform(x);

Exercise left to the OP,

CREATE FUNCTION myTransform ( x string )
RETURNS tsvector AS $$
  BEGIN
    -- modify x
    RETURN x::tsvector;
  END;
$$ LANGUAGE plpgsql
IMMUTABLE;