I would like a column in a table inside a PostgreSQL database (I am using version 9.6). I know of the UTF8_UNICODE_CI
collation on MySQL, so I tried:
CREATE TABLE thing (
id BIGINT PRIMARY KEY
,name VARCHAR(120) NOT NULL COLLATE "UTF8_UNICODE_CI"
);
but I get:
ERROR: collation "UTF8_UNICODE_CI" for encoding "UTF8" does not exist
Reading around, I found that the pg_collation
table displays collations, which showed:
=# SELECT * from pg_collation;
collname | collnamespace | collowner | collencoding | collcollate | collctype
----------+---------------+-----------+--------------+-------------+-----------
default | 11 | 10 | -1 | |
C | 11 | 10 | -1 | C | C
POSIX | 11 | 10 | -1 | POSIX | POSIX
(3 rows)
So it's clearly not there… Is there any way to add it, or obtain the behavior I want? (I want a search WHERE name LIKE '%lala%'
to match 'lalá'
, 'LÂLÄ'
, etc.)
Best Answer
In Postgres, you would typically solve this differently.
For starters, just use
ILIKE
for case insensitive matching. The manual:You do not need a different collation for this. Trigram indexes support both case sensitive and insensitive pattern matching:
Or use the more portable
lower(col) LIKE
Or you could use the special data type
citext
provided by the additional module citext:If you insist on a special collation, you can create your own:
I would not do that.
If you want to take string "normalization" one step further (remove all diacritic signs), look to
unaccent()
:You can combine both, instructions in the linked answer. More here:
Overview:
I would just use
ILIKE
and support it with a trigram index for big tables. Preferably use the latest version Postgres 9.6. It has several improvements for trigram GIN indexes.