PostgreSQL 9.4 – Indexing a Field Containing People’s Names

full-text-searchpostgresqlpostgresql-9.4

I have a materialized view of around 200 million records and would like to add an index to the 'full_names' field to speed up the search. The name data is stored like this;

| full_names                                                    |
-----------------------------------------------------------------
| Smith, John | Doe, Jane,    | Doe, John                       |
| Smith, James| Smith, Clare, | Doe, John                       |
| Adams, John | Doe, Jane,    | Doe, Steven                     |

Since my search term uses 'ILIKE' and '%' a traditional btree index will not work. I had intended to use a trigram index using the following:

CREATE INDEX idx_full_names_trgm_gist ON sample_table USING gisy(full_names gist_trgm_ops);

but received this error:

ERROR: index row requires 10216 bytes, maximum size is 8191

My next thought was to us a ts_vector but my understanding was that ts_vector breaks down words to dictionary defined lexemes. I assume this won't work with names since they are not dictionary words?

Any suggestions on the best way to index this data for fast searching?

This data is largely static and is not updated regularly. External search tools like Solr are not currently an option dues to resrtictions on the server so I was hoping to use postgres' native functions.

Postgres 9.5/RHEL 6.8

Best Answer

Try building a GIN index with gin_trgm_ops instead of GiST. I've never found GiST useful for trgm indexes.

That said, if your are failing with that message, you probably have some weird data in your table that should be cleaned up. Can you figure which row is causing the problem? (If not, start with the longest records first as the most likely suspects)