Mariadb – Change default collation for character set utf8mb4 to utf8mb4_unicode_ci

character-setcollationmariadbmariadb-10.3

When creating a database without specifying a character set or collation the servers defaults are used (as expected).

MariaDB [(none)]> SHOW VARIABLES LIKE '%_server' ;
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| character_set_server | utf8mb4            |
| collation_server     | utf8mb4_unicode_ci |
+----------------------+--------------------+

MariaDB [(none)]> CREATE DATABASE `test-without-charset` ;
MariaDB [(none)]> SELECT `DEFAULT_COLLATION_NAME` FROM `information_schema`.`SCHEMATA` WHERE `SCHEMA_NAME` LIKE 'test-without-charset';
+------------------------+
| DEFAULT_COLLATION_NAME |
+------------------------+
| utf8mb4_unicode_ci     |
+------------------------+

However, when specifying the character set within the CREATE DATABASE-query, the default collation changes to utf8mb4_general_ci.

MariaDB [(none)]> CREATE DATABASE `test-with-charset` CHARACTER SET utf8mb4 ;
MariaDB [(none)]> SELECT `DEFAULT_COLLATION_NAME` FROM `information_schema`.`SCHEMATA` WHERE `SCHEMA_NAME` LIKE 'test-with-charset';
+------------------------+
| DEFAULT_COLLATION_NAME |
+------------------------+
| utf8mb4_general_ci     |
+------------------------+

I already found out that (mysql-manual)

If CHARACTER SET charset_name is specified without COLLATE, character
set charset_name and its default collation are used. To see the
default collation for each character set, use the SHOW CHARACTER SET
statement or query the INFORMATION_SCHEMA CHARACTER_SETS table.

And indeed it shows utf8mb4_general_ci, so it is following the rules

MariaDB [(none)]> SHOW CHARACTER SET LIKE 'utf8mb4';
+---------+---------------+--------------------+--------+
| Charset | Description   | Default collation  | Maxlen |
+---------+---------------+--------------------+--------+
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci |      4 |
+---------+---------------+--------------------+--------+

So my question is: How do I change this default collation for the character set utf8mb4. Is there some configuration-file I can change to alter this behaviour? I'd really like those two to be consistent.

Off course I tried Google to find anything relevant, but all I can find is changing the collation_server-setting.

Server version: 10.3.15-MariaDB-log MariaDB Server

Best Answer

I don't think there is a way to change that DEFAULT.

Anyway, it would be better to use utf8mb4_unicode_520_ci, which is based on a later Unicode standard.

Just get into the habit of specifying CHARACTER SET and COLLATION on all connections and CREATE TABLEs. MySQL and MariaDB are gradually changing from latin1_swedish_ci to utf8mb4_0900_ai_ci. MariaDB is not there yet, but I expect them to move soon. And "900" is probably not the last Unicode standard.

By explicitly specifying the charset and collation, you maintain control and consistency, even if it is an out-dated pair.

A compromise...

But charset and collation on CREATE DATABASE. Then any tables built without specific settings will inherit those settings. And columns within that table will inherit from the table's settings.