Postgresql – postgres: index to SELECT strings like ‘%foo%’;

indexindex-tuningperformancepostgresqlpostgresql-performance

I have an expensive, periodic PostgreSQL 9.3 query of the form:

SELECT * from mytable where name NOT LIKE '%foo%';

foo is actually a constant here that never changes. The query is expensive because it requires a table scan. I expect that only a few rows will actually match the query, so I want to use a partial index to speed up the query.

CREATE INDEX foo_idx ON mytable ((name NOT LIKE '%foo%')) WHERE name NOT LIKE '%foo%';

However, when I do an EXPLAIN on my query, it still is using a sequential scan instead of relying on the new index. What am I doing wrong?

I have tried several different expressions inside the parenthesized expression clause of the CREATE INDEX statement, but nothing helps:

CREATE INDEX foo_idx ON mytable (name) WHERE name NOT LIKE '%foo%';
CREATE INDEX foo_idx ON mytable ((name LIKE '%foo%')) WHERE name NOT LIKE '%foo%';

I have also tried removing the WHERE clause for the index, and that doesn't help either.

Best Answer

The index should work as is. As pointed out by @jjanes, you just needed to run ANALYZE. However, I suggest you modify the index. The expression in your definition is not useful.

CREATE INDEX foo_idx ON intentions ((data LIKE '%foo%')) WHERE data NOT LIKE '%foo%';

The expression is always FALSE and useless noise. Either simplify to the slightly cheaper equivalent index:

CREATE INDEX foo_idx ON intentions ((FALSE)) WHERE data NOT LIKE '%foo%';

Or, better yet, put something useful there. Since each index entry is going to allocate MAXALIGN (typically 8 bytes) for data anyway, and the overhead per index row is another 12 bytes (and the default fillfactor for btree indexes is 90%) you might as well put the index to good use and help other queries. If you have any other queries that might use the same partial index?

If you put something worth up to 8 bytes (like one or two int4 columns), the index retains exactly the same size and performance, but offers more possible use cases. If (one of) the column(s) is involved in updates, you have some additional maintenance cost. Still, if you have anything useful, use it. Any additional column that's used in a JOIN / WHERE or ORDER BY clause. Or column(s) that enable index-only scans:

CREATE INDEX foo_idx ON intentions (<useful_column(s)>)
WHERE data NOT LIKE '%foo%';