What is the difference between utf8mb4_0900_bin
vs utf8mb4_bin
binary collations?
MySQL – Differences Between utf8mb4 Binary Collations
collationMySQLutf-8
Related Question
- SQL Server – Optimizing Joins Between Databases with Different Collations
- Sql-server – SELECT query: filter unable to distinguish between two different characters
- Difference Between Views and Materialized Views in Databases
- SQL Server – Index Not Used Due to Different Collations Workaround
- PostgreSQL – Difference Between Collations ‘C’ and ‘C.UTF-8’
Best Answer
There are three differences as far as I can tell (according to their documentation):
Case-mappings (for
LOWER()
/UPPER()
functions):https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html#charset-unicode-sets-uca
The difference between the two collations in this context is that the
_0900_
version, being based on a newer version of Unicode, quite likely has more mapping definitions (and possibly even some corrections).Padding vs No Padding (of trailing spaces):
https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html#charset-unicode-sets-pad-attributes
Essentially,
utf8mb4_bin
ignores trailing spaces whileutf8mb4_0900_bin
does not ignore them. See the documentation (linked above) for an example.Sorting (performance only, not the ordering):
https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html#charset-unicode-sets-collating-weights
Translating that into human, they are saying that for a code point such as U+FF9D,
utf8mb4_bin
will see the UTF-8 encoded byte sequence of EF BE 9D and convert that into 00 FF 9D. But,utf8mb4_0900_bin
will not convert it into the code point value. This is due to the UTF-8 byte sequence already being sequential, hence the ordering is the same as it for the code point values. So why bother with that extra conversion step?