Postgresql – Case-insensitive collation still comparing case-sensitive

case sensitivecollationpostgresqlpostgresql-12

I am currently trying to create a table with a text column which will compare case insensitive by default. This is because we have a third party program that executes a search on our database. The SELECT statements used by this program can not be altered.

The abstract problem is that we somehow need this search to be case insensitive but it currently is case sensitive.

I read that Postgres 12 does support non-deterministic collations which allow for this behavior.

I have a Postgres Server (Version PostgreSQL 12.1, compiled by Visual C++ build 1914, 64-bit) installed on a German Windows machine.

So for testing purposes, I created a new database to test on:

CREATE DATABASE collation_test
    WITH 
    OWNER = postgres
    ENCODING = 'UTF8'
    CONNECTION LIMIT = -1;

In this database I created the following collation I found in an article about these collations

CREATE COLLATION collat_ci (
  provider = 'icu',
  locale = 'und-u-ks-level2',
  deterministic = false
);

After this, I needed a table to test this collation with

CREATE TABLE public.person
(
    "Id" bigint NOT NULL,
    "Name" text COLLATE public.collat_ci,
    PRIMARY KEY ("Id")
);

ALTER TABLE public.person
    OWNER to postgres;

INSERT INTO person VALUES
(1, 'Robin'),
(2, 'robin');

So now I tried the following select query:

SELECT 
(
    SELECT "Name" FROM person p1 WHERE p1."Id" = 1
)
= 'Robin';

Which returned true as expected as the text in the database and the literal given match completely.

But if I try the same with a lower case r which I would expect to match because of my collation but it still returns false.

SELECT 
(
    SELECT "Name" FROM person p1 WHERE p1."Id" = 1
)
= 'robin';

When trying to compare both inserted rows' name to each other I still got false as a result:

SELECT 
(
    SELECT "Name" FROM person p1 WHERE p1."Id" = 1 --'Robin'
)
=
(
    SELECT "Name" FROM person p2 WHERE p2."Id" = 2 --'robin'
);

Does anyone know why my comparison does not behave as expected and how to get it to do so?

Best Answer

The version of ICU shipped with the Windows builds is a pretty old one, so maybe that's the reason.

Try

CREATE COLLATION collat_ci (
  provider = 'icu',
  locale = '@colStrength=secondary',
  deterministic = false
);

This should work with older ICU versions.