Postgresql – Why would you index text_pattern_ops on a text column

collationindexpattern matchingpostgresql

Today Seven Databases in Seven Weeks introduced me to per-operator indexes.

You can index strings for pattern matching the previous queries by creating a text_pattern_ops operator class index, as long as the values are indexed in lowercase.

CREATE INDEX moves_title_pattern ON movies (
    (lower(title) text_pattern_ops);

We used the text_pattern_ops because the title is of type text. If you need to index varchars, chars, or names, use the related ops: varchar_pattern_ops, bpchar_pattern_ops, and name_pattern_ops.

I find the example really confusing. Why is it useful to do this?

If the column is type text, wouldn't the other types (varchar, char, name) be cast to to text before being used as a search value?

How does that index behave differently from one using the default operator?

CREATE INDEX moves_title_pattern ON movies (lower(title));

Best Answer

The documentation often gives you an answer to such questions. Like in this case, too:

The operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the types text, varchar, and char respectively. The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard "C" locale. As an example, you might index a varchar column like this:

CREATE INDEX test_index ON test_table (col varchar_pattern_ops);

Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes. (Ordinary equality comparisons can use these operator classes, however.) It is possible to create multiple indexes on the same column with different operator classes.

The documentation goes on to say:

If you do use the C locale, you do not need the xxx_pattern_ops operator classes, because an index with the default operator class is usable for pattern-matching queries in the C locale.

You can check your locale as follows (it is likely to be UTF8 rather than "C"):

postgres=> show lc_collate;
 lc_collate
-------------
 en_GB.UTF-8