Mysql – Reason for multiplicity of character/collation settings in MySQL

amazon-rdsMySQL

The last 48 hours or so I spend on trying to determine the cause for sporadic crashes of my MySQL (Aurora) DB. Turns out they were attributed to my mixing of database, table, column and stored procedure collation/charsets. My bad, or is it?

Aurora aside, what were the practical reasons for all the ways to configure charset/collation in MySQL? Why couldn't utf8 just become utf8mb4 as part of a major release? Why changing charset or collation of a database/table doesn't actually change the database or table, only future additions to the schema? The last one provides for an opportunity to have a mix of columns charset/collations in your tables, for instance. What sane person would want to have different charsets/collations within the same DB? And if there is a rationale for all this madness, why not make it the exception rather than the norm when configuring mysqld?

In my case, to resolve mysqld crashes I had to ensure all of below return consistent charset/collation values:

show procedure status where db = '<database>';
SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = "<database>";
SELECT T.table_name, CCSA.character_set_name
 FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "<database>"
;
SELECT table_name, column_name, character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "<database>";

Example crash stack trace which appears to be tied to my mixing of charset/collation types:

  mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=22
max_threads=2000
thread_count=18
connection_count=18
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1070055 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2a2a2a50d000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x2a2a2a50d000 thread_stack 0x40000
/rdsdbbin/oscar/bin/mysqld(my_print_stacktrace+0x2c)[0x960ccc]
/rdsdbbin/oscar/bin/mysqld(handle_fatal_signal+0x491)[0x6d7bb1]
/lib64/libpthread.so.0(+0xf5b0)[0x2b299b88d5b0]
/rdsdbbin/oscar/bin/mysqld(_ZN4JOIN7prepareEP10TABLE_LISTjP4ItemjP8st_orderS5_S3_P13st_select_lexP18st_select_lex_unit+0x127d)[0x76e0fd]
/rdsdbbin/oscar/bin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_P10SQL_I_ListI8st_orderESB_S7_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x86b)[0x776eab]
/rdsdbbin/oscar/bin/mysqld(_Z13handle_selectP3THDP13select_resultm+0x127)[0x777067]
/rdsdbbin/oscar/bin/mysqld[0x5ac17d]
/rdsdbbin/oscar/bin/mysqld(_Z30mysql_execute_command_internalP3THD+0x3c74)[0x7556d4]
/rdsdbbin/oscar/bin/mysqld(_Z21mysql_execute_commandP3THD+0x30)[0x757670]
/rdsdbbin/oscar/bin/mysqld(_ZN18Prepared_statement7executeEP6Stringb+0x354)[0x768914]
/rdsdbbin/oscar/bin/mysqld(_ZN18Prepared_statement12execute_loopEP6StringbPhS2_+0xb6)[0x768ad6]
/rdsdbbin/oscar/bin/mysqld(_Z19mysqld_stmt_executeP3THDPcj+0x153)[0x768d73]
/rdsdbbin/oscar/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x12e4)[0x75a204]
/rdsdbbin/oscar/bin/mysqld(_ZN22OscarSchedulerConsumer7consumeEjj+0xc0)[0x7e5610]
/rdsdbbin/oscar/bin/mysqld(_ZN22OscarSchedulerConsumer5startEv+0xa0)[0x7e5720]
/rdsdbbin/oscar/bin/mysqld(_ZN22OscarSchedulerConsumer11drain_queueEPv+0x6a)[0x7e596a]
/lib64/libpthread.so.0(+0x7f18)[0x2b299b885f18]
/lib64/libc.so.6(clone+0x6d)[0x2b299e658b2d]

Best Answer

Version 5.7 made good strides toward settling on utf8mb4; the next version, 8.0, is there.

But, changing anything under the hood is all too likely to cause grief. MySQL learned that lesson when "fixing" the handling of ß in utf8_general_ci collation many years ago. It broke lots of users' tables. So, they came out with the kludge utf8_general_mysql500_ci.

The grief over the 767-byte limit on a column in an index would break blindly changing utf8 to utf8mb4.

I predict that utf8 will live on, but only for compatibility for those upgrading. Meanwhile, a really UTF-8-compliant utf8mb4 will be the default. And there will be a succession of collations such as utf8mb4_unicode_ci, utf8mb4_unicode_520_ci, utf8mb4_0900_ai_ci, and more will keep up with the Unicode standard (4.0, 5.20, 9.0), which keeps changing.

The charset and collation for a database is only the default for subsequently created tables. Ditto for table:column.

To change the charset/collation for an entire table: ALTER TABLE .. CONVERT TO ...

I think I am sane. This makes sense to me:

CREATE TABLE People (
    name        ... CHARACTER SET utf8mb4,
    postal_code ... CHARACTER SET ascii
);

It is even a performance issue if you let postal_code with utf8mb4 unnecessarily explode by a factor of 4 during certain operations.

As for why things are crashing in Aurora, I don't see enough info to help with this question. And, since I have not heard of anything similar in MySQL or MariaDB, I suspect is may be an Aurora bug. (Aurora is MySQL plus low level improvements; I have not heard of charset issues, but who knows?)