RDS for MySQL 5.6 supports "in" and "out" migration strategies of live workloads using MySQL's native replication, but RDS for MySQL 5.5, they only support "in" migration... which is unfortunate, because essentially every possible combination except the one that you need is available. That would have been nice. :(
If your dump download took 3 hours and your upload took 4+ hours, the good news is, that you only need the 4+ hours, because you can do them simultaneously... and depending on whether you did that testing from EC2 or from your own network, possibly faster.
What I suggest is tying mysqldump
to mysql
using a pipeline, so that you are simultaneously reading from the old server and writing to the new server. To make this work properly, though, we need to add a couple of twists. And then a couple of extra twists make it almost fun.
I would think your ideal scenario would be to spin up a disposable linux machine in EC2 in the same region, of you don't already have one available... and preferably the same availability zone as your servers, because this is going to give you the maximum potential bandwidth among the 3 components involved and that's one less thing to slow you down.
Next, on the source server where the data is coming from, you need to issue SET GLOBAL NET_WRITE_TIMEOUT = 600;
. Of course, you can hardly SET GLOBAL
anything on RDS, so you have to change this via the RDS Parameter Group. NET_WRITE_TIMEOUT
is the amount of time MySQL Server will wait before aborting an attempt to write to a client that is blocking. When you're piping the output from a dump directly into another server, the destination server is going to periodically block, and sometimes for longer than the default 60 seconds. There's a companion variable, NET_READ_TIMEOUT
, that you might want to set to a similar value, although I don't think this variable has any relevance in a dump environment. This will prevent the source server from timing out when the destination server takes too long to do one of the massive inserts that mysqldump
builds. If you have MyISAM
tables, you need to set it even higher, because the ALTER TABLE ... ENABLE KEYS
can take forever to run after a table is fully restored on the target... but hopefully, you don't.
The value "600" (10 minutes) is somewhat arbitary, but it's what I use for operations like this.
As mysqldump
is reading from a remote server and writing to a pipe into the mysql
command line client, when mysql
sends a big insert to the destination, it stops reading from the pipe, which blocks mysqldump
from writing to the pipe, which in turn blocks mysqldump
from reading from the socket until the destination server is ready for more data. In addition to the timeouts this causes, it also wastes a lot of time for several different reasons... mysqldump
spends a lot of time discovering table structures and starting its SELECT *
queries and we want to avoid all of that wasted time. The ticket to this is setting up a double buffer using the unix dd
utility.
If we invoke dd specifying a large output block size without specifying an input block size, it will read from its input with a small block size (default 512 bytes) until it has read the amount of data specified by its output block size, then block its input and flush that data out its output in a single massive write.
dd obs=16384K
This will cause dd to accept 16 MB of data as fast as you can feed it, and then, once it can flush that block to its output, it will then accept more.
If we stack two of them, we get a double buffer. :)
mysqldump [options] | dd obs=16384K | dd obs=16384K | mysql [options]
Using this construct, we can always keep between 16 and 32 megabytes of data in RAM, in flight, at all times, between the origin server and the destination server... so we are essentially guaranteed that as soon as the destination server is ready for data, we have data immediately ready for it, with no waiting for the origin... and the origin can keep itself busy keeping our buffers full.
16 MB isn't really a large mount of data, so even a larger buffer might be desirable, but we don't want to make these too big, because when they're both full, the origin has to wait until the destination can read the entire 16 MB from the 2nd buffer before the 1st buffer can flush into the 2nd buffer which causes the 1st buffer to unblock and start accepting more data from the origin... However, we can keep more data in flight if desired by chaining even more copies of dd. They will use 100% of the specified memory but they use very little CPU. I usually chain at least 4 of them.
Incidentally, this same hack will also get you a faster mysqldump-pipe-to-gzip (bzip, xz, etc.) if you stick some dd in between there, because it significantly increases the likelihood that you'll always have data available for your compression program to crunch on, keeping the processor utilization of the compression program maxed, which means a faster completion because the crunching gets done in less wall-clock time due to full processor utilization.
You should also probably use the --compress
option on both mysqldump
and mysql
so that the data is compressed over the wire in both directions. Compression isn't free, but this compress is just zlib inflate/deflate so it's not a very significant expense in CPU itself. The slowest part of an operation like this, as you noted, is on the destination, and all the destination has to do is the less-costly decompression.
Add the --verbose
option to mysqldump
so that you can watch the data flowing.
And, for a little extra sweetness, a couple of throughput meters:
$ sudo apt-get install pv
This will allow you to keep an eye on the proceedings with output on your terminal resembling this example:
inbound: 128MB 0:07:02 [ 0B/s] [ 364kB/s]
outbound: 96MB 0:06:00 [ 154kB/s] [ 334kB/s]
If you know approximately how big your dump is, you can use additional options to give pv
a hint of that size and it will calculate progress in percent.
Your final command looks like this (as a single line, formatted here as multiple lines for clarity):
mysqldump --compress --verbose [other options] |
pv -pterabc -N inbound |
dd obs=16384K |
dd obs=16384K |
dd obs=16384K |
dd obs=16384K |
pv -pterabc -N outbound |
mysql --compress [other options]
The [other options] of course are the usual authentication credentials and other options you used with mysqldump and with the mysql client when you did your previous restore.
If you have multiple schemas or want to otherwise go to the trouble of running parallel copies of mysqldump | mysql
it might be worth a shot to decreasing your total time, as well... restoration time is a product of CPU and IO capacity on the destination server, and with RDS it seems like you might see the combined throughput of two parallel threads be faster than 1 thread.
I think RDS also has an official, supported option to temporarily disable binary logging while restoring a dump file. You might explore that as well, as it might improve restoration time... so long as you turn it back on when you're done.
Best Answer
You don't need to dump and import the databases, a copy of the folder /var/lib/mysql should be ok for backup.
I suggest you upgrade to MySQL 5.7 as its the currently supported release and has better performance,
here are the steps, I implemented it many times with no issues
backup your databases before the upgrade
$ cp /var/lib/mysql /var/lib/mysql.original
Download the MySQL 5.7 RPM
$ wget http://repo.mysql.com/mysql57-community-release-el7.rpm -P /tmp/
Remove the MySQL-Community RPM that contains MySQL 5.5
$ yum remove mysql-community-release
Install the MySQL 5.7 RPM
$ rpm -ivh /tmp/mysql57-community-release-el7.rpm
Update the MySQL Repository to 5.7
$ yum update mysql
Let MySQL know that it received an upgrade
service mysql start
mysql_upgrade