PostgreSQL – Equivalent of UTF8_UNICODE_CI Collation

case sensitivecollationencodingpattern matchingpostgresql

I would like a column in a table inside a PostgreSQL database (I am using version 9.6). I know of the UTF8_UNICODE_CI collation on MySQL, so I tried:

CREATE TABLE thing (
    id    BIGINT PRIMARY KEY
   ,name  VARCHAR(120) NOT NULL COLLATE "UTF8_UNICODE_CI"
);

but I get:

ERROR: collation "UTF8_UNICODE_CI" for encoding "UTF8" does not exist

Reading around, I found that the pg_collation table displays collations, which showed:

=# SELECT * from pg_collation;
 collname | collnamespace | collowner | collencoding | collcollate | collctype
----------+---------------+-----------+--------------+-------------+-----------
 default  |            11 |        10 |           -1 |             |
 C        |            11 |        10 |           -1 | C           | C
 POSIX    |            11 |        10 |           -1 | POSIX       | POSIX
(3 rows)

So it's clearly not there… Is there any way to add it, or obtain the behavior I want? (I want a search WHERE name LIKE '%lala%' to match 'lalá', 'LÂLÄ', etc.)

Best Answer

In Postgres, you would typically solve this differently.

For starters, just use ILIKE for case insensitive matching. The manual:

The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.

WHERE name ILIKE '%lala%'

You do not need a different collation for this. Trigram indexes support both case sensitive and insensitive pattern matching:

Or use the more portable lower(col) LIKE

WHERE lower(name) LIKE '%lala%'  -- pattern must be lower case, too.

Or you could use the special data type citext provided by the additional module citext:

If you insist on a special collation, you can create your own:

I would not do that.

If you want to take string "normalization" one step further (remove all diacritic signs), look to unaccent():

You can combine both, instructions in the linked answer. More here:

Overview:


I would just use ILIKE and support it with a trigram index for big tables. Preferably use the latest version Postgres 9.6. It has several improvements for trigram GIN indexes.