Mysql – Latin1 to UTF8 on large MySQL database, minimal downtime

myisamMySQLreplication

Let me lay out the basic scenario for the problem I'm up against.

  • I'm running MySQL 5.1 and have a single database with 2 large MyISAM tables (9M and 20M rows respectively).
  • Each of these tables has several VARCHAR and TEXT columns which are currently set to the latin1 character set and latin1_ci collation.
  • These tables are actively read / written to, so we want to avoid as much downtime as possible.

I'm confident in actually converting the data between the two character sets (using ALTER TABLE to convert columns to their binary counterparts, then back to the correct original column type in the correct charset). What I'm not sure about is how to go about the conversion without locking the entire table for the duration (which could take hours / days due to the number of rows and indexes).

One idea is to set up replication to a slave; turn off the replication; convert everything; re-enable replication and let the slave catch up; then finally promote the slave to master.

However, this doesn't solve the issue of how to set up replication in the first place. From what I've read, even mysqlhotcopy & xtrabackup still require table locks for MyISAM tables.

Is there another option I've missed, or am I going to have to turn off my entire application in order to set this up? Thanks for any advice.

Best Answer

Promoting a slave would probably be my preferred route. As you pointing out any selects on MyISAM tables would require table level locks. There is one tool that might be able to help, pt-table-sync. It's primary purpose is to find gaps and differences in existing master slave relationships.

A nice thing about it is it does this in nice "chunks". Think of it kind of like antilock breaks. The chunk size is configurable but you could, for example go through doing 1000 rows at a time, minimizing lock times and letting things flow through. I haven't used it to fully repopulate a slave from scratch though although I'd give that a look. Once you have a full copy, do another run to catch new stuff, updates that have come in. Do a flush table with read locks, do a final table sync. run show master status to get the binary log position to start slaving from, unlock tables.

Oh, if you don't have binary logging you'll at least need a master bounce and cnf change unfortunately.

Another approach that might be much simpler depending your write downtime tolerance. You're all myisam, you gave the size in rows by what's the disk footprint in MB or GB? Figure out how long it would take to transfer that size between your machines (hopefully there both in the same local network). You could do flush table with readlock, again still show master status, then just rsync the .MY* files over to your new DB's datadir.

One final alternative, depending on how you're setup: Can you do an LVM or other kind of filesystem snapshot? This would be the best way to minimize downtime. You flush tables with readlock, show master status, start snapshot, then unlock the tables to allow full read write activity to flow through. The difference here is you'll copy the snapshot you started. You'll just need to feel confident the write activity won't exceed the snapshot size you allocate before the copy finishes.

What ever method, before promotion I would verify the character set conversions went as desired. They can be a real pain to reverse.

I would also recommend upgrading to innodb if possible to make it possible to use xtrabackup in a non blocking fashion in the future.