Mysql – Changing from utf8 to utf8mb4

character-setcollationMySQLutf-8

MySql database at the moment has:

  • Character Set: utf8
  • Default Collation Name: utf8_unicode_ci

I want to change it to:

  • utf8mb4
  • utf8mb4_unicode_ci

I would also do the same thing for all tables in the database.

My question is the following:

  • By doing that change is there a possibility that the content of my columns gets messed up in any way? (Cause I am really scared to do it!!!)
  • I have millions of rows in the tables, is there ANY way to do a check after I make the switch that the content of all tables is the OK?
    • Some crazy ideas I am thinking of include:
      • Running some scripts to compare original DB to changed DB
      • Running some scripts to calculate column sizes of original DB to changed DB

Any help or ideas which would guide me to the right direction would be appreciated.

THANK YOU

Best Answer

Main thing to watch altering tables is the index length will increase for utf8 indexed columns which might put them over limit. Adjust URL for your mysql version as some differences apply. These index length exceeded errors will show up immediately on ALTER TABLE so this will be a final bit of the migration.

On validation, generally won't be a problem. Try this on a few tables to get confidence in the idea confidence on smaller tables:

   CREATE TABLE tbl_new LIKE tbl
   ALTER TABLE tbl_new ..  {change to utf8mb4 fields}
   INSERT INTO tbl_new SELECT * FROM tbl
   RENAME TABLE tbl_new TO tbl, tbl TO tbl_old

A SELECT ... tbl JOIN tbl_old ON tbl.pk = tbl_old.pk WHERE tbl.utf_field != tbl_old.utf_field can be used to validate values.

Application behavior can be checked, and you can RENAME back if their are problems. (taking to account updates).