I have a MySQL 5.5.31 database which has approx 220 tables – of these 220 tables, around half of them are already using utf8mb4_unicode_ci but the "older" tables are still using latin1_swedish_ci. The fields in the tables are a mix of integer, varchar, longtext, date, datetime and decimal and there are no views or stored procedures.
I now have a requirement to support Russian text so I need to convert the remaining tables to UTF8. Based on my reading, I have opted for utf8mb4_unicode_ci as I believe it offers the broadest support including Chinese.
I have tested as much as I can with individual tables in a test environment without any issues and now need to bite the bullet and do this on my production database – I will, of course, take a full backup just before I make any changes.
I have read of some issues with converting TEXT and CHAR fields as well as some considerations on index sizes for VARCHAR fields but nothing else significant.
I know how to do this but I wanted a final sanity check to make sure there is nothing major I am missing before taking the plunge?
Best Answer
I assume you will do this for each table?
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4;
One potential issue involves the maximum of 767 bytes per column in an INDEX. If you currently have a VARHAR(255) latin1 field in an index, you will need to rethink it.
VARCHAR(191)
if you are sure that 191 will suffice (into the future)INDEX(foo(191))
. This is often not worth doing.Edit
For shrinking to
VARCHAR(191)
, you could write a Stored Procedure to look ininformation_schema
to discover all the VARCHARs, and create and executeSELECT MAX(CHAR_LENGTH(col)) ...
to see if any are over 191 (or even dangerously close to 191). You might be surprised to find a few with max=255, implying that you may already have truncated some text.The other two options won't damage data, but could have performance problems.