What does “collation” mean

terminology

I am learning sqlite from a book which has mentioned collation and collating sequence multiple times. What does it mean exactly in the database world?

Best Answer

Etymology

"Collation" is probably best defined on etymology.com,

late 14c., "act of bringing together and comparing,"

It hasn't changed at all in the past 600 years. "collate" means "to bring together" and whether it's the ordering of a book, chronologically or otherwise in a database, it's all the same.

Computer Science

As it applies to Computer Science, what we're doing is separating the storage mechanism from the ordering. You may have heard of ASCII-betical. That refers to a comparison based on the binary encoding of ASCII characters. In that system, storage and collation are two in the same. If the ASCII-standardized "encoding" ever changed the "collation" (order) would change too.

POSIX started to break that with LC_COLLATE. But as we move into Unicode a consortium emerged to standardize collations as well: ICU.

SQL

In the SQL spec there are two extensions to provide collations,

F690: “Collation support:

Includes collation name, collation clause, collation definition and drop collation.
F692: Extended collation support,

Includes attaching a different default collation to a column in the schema.

Essentially these provide the ability to CREATE and DROP collations, to specify them for operators and sorts, and to define a default for columns.

For more information on what SQL has to offer check out,

PostgreSQL documentation on collations
Oracle Linguistic Sorting and Matching
MS SQL Server, Collation and Unicode Support, and Collations
Character Sets and Collations in MySQL

Examples

Rather than pasting a limited example, here is the PostgreSQL test suite it's pretty extensive. Check out at least the first link and look for 'Türkiye' COLLATE "tr-x-icu" ILIKE '%KI%' AS "false"

collate.icu.utf8.out
collate.linux.utf8.out
collate.out Probably not worth checking out, but I placed it here for completeness, boring POSIX checks.

Best Answer

Etymology

Computer Science

SQL

Examples

Related Solutions

What do you call something that’s not the primary key

What does horizontal scaling mean

Related Question