What does “collation” mean

terminology

I am learning sqlite from a book which has mentioned collation and collating sequence multiple times. What does it mean exactly in the database world?

Best Answer

Etymology

"Collation" is probably best defined on etymology.com,

late 14c., "act of bringing together and comparing,"

It hasn't changed at all in the past 600 years. "collate" means "to bring together" and whether it's the ordering of a book, chronologically or otherwise in a database, it's all the same.

Computer Science

As it applies to Computer Science, what we're doing is separating the storage mechanism from the ordering. You may have heard of ASCII-betical. That refers to a comparison based on the binary encoding of ASCII characters. In that system, storage and collation are two in the same. If the ASCII-standardized "encoding" ever changed the "collation" (order) would change too.

POSIX started to break that with LC_COLLATE. But as we move into Unicode a consortium emerged to standardize collations as well: ICU.

SQL

In the SQL spec there are two extensions to provide collations,

  • F690: “Collation support:

    Includes collation name, collation clause, collation definition and drop collation.

  • F692: Extended collation support,

    Includes attaching a different default collation to a column in the schema.

Essentially these provide the ability to CREATE and DROP collations, to specify them for operators and sorts, and to define a default for columns.

For more information on what SQL has to offer check out,

Examples

Rather than pasting a limited example, here is the PostgreSQL test suite it's pretty extensive. Check out at least the first link and look for 'Türkiye' COLLATE "tr-x-icu" ILIKE '%KI%' AS "false"