ICU Specifies different LDML Collation Settings. Some of them seem pretty interesting, especially the ones on case and accent,
- “Ignore accents”:
strength=primary
- “Ignore accents” but take case into account:
strength=primary caseLevel=on
- “Ignore case”:
strength=secondary
- “Ignore punctuation” (completely):
strength=tertiary alternate=shifted
- “Ignore punctuation” but distinguish among punctuation marks:
strength=quaternary alternate=shifted
potentially a better method of doing what
You can also see these documented here. Are these ICU options and settings possible with PostgreSQL 10 ICU collation support?
CREATE COLLATION special (provider = icu, locale = 'en@strength=primary');
SELECT 'Å' LIKE 'A' COLLATE "special"; # returns false
I've also tried the CLDR BCP47
Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("
el@colCaseFirst=upper
") or in language tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive. See the LDML Collation spec, Collation Settings, and the data file listing the valid collation keywords and their values. (The deprecated attributes kh/colHiraganaQuaternary and vt/variableTop are not supported.)
For that, this looked right
CREATE COLLATION special (provider = icu, locale = 'en-ks-level1');
SELECT 'Å' LIKE 'A' COLLATE "special"; # returns false
Also tried en-u-ks-level1
That method seems to be what the docs go for,
CREATE COLLATION german_phonebook (provider = icu, locale = 'de-u-co-phonebk');
Best Answer
Case insensitive or accent-insensitive collations cannot be used prior to PostgreSQL 12, because internally PostgreSQL considers that strings with a different binary representation are not equal. When the collation-aware comparator says they are equal, it uses the non-collation-aware
strcmp()
function as a tie-breaker, to obtain what Unicode calls a "deterministic" comparison without normalization.Starting with PostgreSQL 12, collations have a
deterministic
property, which must be set tofalse
to benefit from equality of non-binary equal strings. FromCREATE COLLATION
:The exact query in the question is not possible with nondeterministic collations because they don't support
LIKE
or any form of pattern matching (as of PostgreSQL 12). Also the locale should have a-u-
before the collation subtags, otherwise they're going to be silently ignored by the collator.What does work as intended: