Mysql – Master and slave tables with different character sets – will replication break

character-setMySQLreplication

We have some huge tables in our MySQL 5.6 database that have latin-1 as their character set, and want to convert them all to UTF-8 (along with converting the database's global character set). The necessary ALTER TABLE statements take, between them, about an hour to run. We'd like to minimise our downtime.

Our plan is to set up replication, make the encoding changes on the slave database, let the databases sync up, and then briefly take down our webserver and point it at the slave database.

Will this work, or is there a risk of it causing data corruption in the slave?

My concern about this stems from there being two obvious possible approaches for how string values (that are being INSERTed or UPDATEed) might be stored in MySQL's binlog and sent across to the slave. Either…

  1. The values are stored as unicode character sequences (i.e. the binlog knows what encoding they are encoded in sends that information to the slave so that it can convert the values to match the character set of the target column on the slave.)

    or…

  2. The encoded values are just stored as byte sequences, with no record of what the character set was.

The documentation doesn't shed any light on which of the above models MySQL uses.

If #1 is true, then our plan will work fine.

However, if #2 is true, then trying to replicate queries from the master to the slave is going to go wrong in some way if one of the columns involved has a different character set. For instance, once we've converted our tables from latin-1 to UTF-8 on the slave, the slave might try to store latin-1-encoded strings passed across from the master in a UTF-8-encoded column, resulting in either the stored text not being validly encoded in UTF-8 or representing the wrong characters.

The MySQL docs on Replication and Character Sets shed little light on the problem; they discuss potential issues resulting from differences in the global character set of each server, but don't touch on potential problems caused by differences in table or column character sets at all.

Is our plan sound, or are we at risk of ending up with corrupt data on the slave?

Best Answer

https://dev.mysql.com/doc/refman/5.7/en/replication-features-differing-tables.html describes the supported variations in table definitions.

For one, it appears that this is supported with statement-based replication:

When using statement-based replication, a simple rule of thumb to follow is, “If the statement run on the master would also execute successfully on the slave, it should also replicate successfully”.

With row-based replication, only certain type conversions are supported, which are affected by the slave_type_conversions variable. The key quotes according to my reading are:

[Conversions are supported between] any of the string types CHAR, VARCHAR, and TEXT, including conversions between different widths.

However:

[Row-based] Replication between columns using different character sets is not supported.