PostgreSQL – Search for Partial Matches in Website Domain Names

postgresqlstring-searching

I have a Postgres database with information about websites and a table called sites with a column host. This already has an index for text pattern operations which is excellent for searching from the start of the domain name: WHERE host LIKE 'www.bran% but not so good for partial matches such as WHERE host LIKE '%.bran%' which result in a scan. There are currently around 7.5 million hosts so search performance is already noticeable.

Based on the suggestion below I've added a trigrm index (you need to enable to extension in order to do this:

create extension pg_trgm;

and then add a suitable index:

CREATE INDEX trgm_idx ON sites USING GIST (host gist_trgm_ops);

As with all indices this may take some time depending upon the size of your database.

Before the index my queries were taking about 180s to find 200 items in 7.5 million but this is complicated by a correlated subquery, which seems to be complicating the matter and dominating the query and suggests I should be looking at optimising something else first, specifically a date index.

So the results of the comparison: using a text_ops index takes about 2s and using a trigram about 7. I think I might have more luck if I create an index that only contains the relevant part of the host name.

Best Answer

This looks like a perfect use for a trigram index. As long as your queries don't get much shorter than the one shown. If you want to search for '%.br%', then probably not much is going to help you.