Postgresql – Optimizing a “categorized” search table using trigrams

gist-indexindexperformancepostgresqlpostgresql-9.6postgresql-performance

I've been reading through this excellent answer to understand how pg_trgm works a bit, but I'm still unclear on the most efficient way of solving this query (efficient in terms of speed of search):

I have a table search that I run trigram searches on that looks like this:

Column      |  Type   | Modifiers
------------+---------+-----------
id          | bpchar  | collate C
user_id     | integer |
type        | text    |
search_on   | text    | collate C
data        | json    |
Indexes:
 "index_search_id" UNIQUE, btree (id)
 "index_search_search_on" gist (search_on gist_trgm_ops)
 "index_search_type" btree (type)
 "index_search_user_id" btree (user_id)

In this scenario, user_id is NULLable and type is also NULLable. The queries I'd run amount to these possibilities:

  1. Search for rows (WHERE user_id = 123 OR user_id IS NULL) AND search_on % 'mystring'
  2. Search for rows (WHERE user_id = 123 OR user_id IS NULL) AND type='my-type' AND search_on % 'mystring'

In plain words, I want all rows that have my user_id or NULL user_id, optionally are categorized by type, and match the term being passed in.

Right now I just have individual indexes on the 3 columns (as shown above) that can change based on the query. I understand however that a single index is generally more efficient.

Is it possible to use a single index that does trigram searches, but also exact match on user_id and type where they can optionally be NULL.

Best Answer

Is it possible to use a single index that does trigram searches, but also exact match on user_id and type where they can optionally be NULL.

Yes, NULL is included in indexes. And you can search for it like for any other value.

Yes, you can have a multicolumn trigram GiST index. But GiST indexes typically don't make sense for the data type integer. Btree indexes are better in every respect - except for your case of a multicolumn index. So Postgres does not install the required operator class by default. You need to install the additional module btree_gist first, once per database:

CREATE EXTENSION IF NOT EXISTS btree_gist;  -- only if not installed, yet

Then you can create your multicolumn index:

CREATE INDEX foo ON search USING gist (user_id, type, search_on gist_trgm_ops);

Related (with detailed instructions):

And get operator precedence in your WHERE clause right:

WHERE (user_id = 123 OR user_id IS NULL)  -- parentheses!
AND    search_on % 'mystring'

Or:

WHERE (user_id = 123 OR user_id IS NULL)
AND   (type = 'my-type' OR type IS NULL)
AND    search_on % 'mystring'

Depending on data distribution, cardinalities, selectivity of predicates, cost settings etc. Postgres may still prefer an index on one (or two) column(s) (occasionally).