MySQL – Differences Between utf8mb4 Binary Collations

collationMySQLutf-8

What is the difference between utf8mb4_0900_bin vs utf8mb4_bin binary collations?

Best Answer

There are three differences as far as I can tell (according to their documentation):

Case-mappings (for LOWER() / UPPER() functions):

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html#charset-unicode-sets-uca

The LOWER() and UPPER() functions perform case folding according to the collation of their argument.

The difference between the two collations in this context is that the _0900_ version, being based on a newer version of Unicode, quite likely has more mapping definitions (and possibly even some corrections).
Padding vs No Padding (of trailing spaces):

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html#charset-unicode-sets-pad-attributes

The pad attribute for utf8mb4_bin is PAD SPACE, whereas for utf8mb4_0900_bin it is NO PAD. Consequently, operations involving utf8mb4_0900_bin do not add trailing spaces, and comparisons involving strings with trailing spaces may differ for the two collations

Essentially, utf8mb4_bin ignores trailing spaces while utf8mb4_0900_bin does not ignore them. See the documentation (linked above) for an example.
Sorting (performance only, not the ordering):

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html#charset-unicode-sets-collating-weights
- For _bin collations except utf8mb4_0900_bin, the weight is based on the code point, possibly with leading zero bytes added.
- For utf8mb4_0900_bin, the weight is the utf8mb4 encoding bytes. The sort order is the same as for utf8mb4_bin, but much faster.
Translating that into human, they are saying that for a code point such as U+FF9D, utf8mb4_bin will see the UTF-8 encoded byte sequence of EF BE 9D and convert that into 00 FF 9D. But, utf8mb4_0900_bin will not convert it into the code point value. This is due to the UTF-8 byte sequence already being sequential, hence the ordering is the same as it for the code point values. So why bother with that extra conversion step?

Related Solutions

sql-server – Difference Between Transaction Logs and Binary Logs

Answering only the MySQL Part of the Question

A binary log records completed SQL statements. You can have many binary logs. Under default settings, binary logs rotate at the 1G mark (see expire_logs_days and max_binlog_size).

You can see binary logs by running one of the following:

SHOW BINARY LOGS;
SHOW MASTER LOGS;

Ther current master log is always the last one in the list. To see just the last binary log, which is the current one, run this:

SHOW MASTER STATUS;

When it comes to InnoDB storage engine and transactions

There is a metadata file (ibdata1, which holds, by default, data pages, index pages, table metadata and MVCC information), also known as the InnoDB tablespace file.
You can have more than one ibdata file (see innodb_data_file_path)
There are redo logs (ib_logfile0 and ib_logfile1)
You can have more than two redo logs (see innodb_log_files_in_group)
You can spread data and indexes across multiple ibdata files if innodb_file_per_table is disabled
You can separate data and index pages from ibdata into separate tablespace files (see innodb_file_per_table and StackOverflow Post on how to set this up)

SQL Server – Reasons for Databases Created with Different Collations

The COLLATE clause of CREATE DATABASE can be used to specify a collation different from the Default Server collation. Also, if the databases were backed up from another server, they would be restored with the collation they had on the source server.

If they are created using CREATE DATABASE and a collation is not specified via the COLLATE keyword, then the default collation would be used.

Best Answer

Related Solutions

sql-server – Difference Between Transaction Logs and Binary Logs

SQL Server – Reasons for Databases Created with Different Collations

Related Question