Mysql – Restoring a MySQL database to a failed master

multi-masterMySQLreplication

I have a master-master configuration in MySQL with two servers. One server should stay live on the network to serve requests (call it server A) and the other should be taken offline to push new code changes (server B).

My idea originally was that after running STOP SLAVE on both servers, that server B could be shut down, updated, and even have a new database schema put in. After this, I thought that I could simply START SLAVE on server B and have the entire database from server A replicated/mirrored back over to server B. However, this is not the case. Restarting the slave and doing a CHANGE MASTER TO (…) and syncing up the log files does not replicate old changes like I want it to: it only replicates new writes from that point on.

I am looking for a way to bring server B up to speed with the latest database from server A, and then have server B continue to replicate changes in a master-master setup. Then I can continue the sequence of server upgrades by doing the same process but keeping server B online only.

Any solutions which require locking the tables will not work since I need to do this change without any downtime. Is this possible?

Best Answer

Slow ALTER in Master-Master

In Master-Master, each server is acting both as a Master and a Slave to the other server.

STOP SLAVE -- When you later do START SLAVE, the stuff that was performed on the Master will be applied to the Slave. If you did, say, a big ALTER on a big table, that ALTER will run, taking a long time. This is probably the "downtime" you are worried about?

The alternative is to do the lengthy process twice.

  1. Stop client usage of B.
  2. On B, turn off writing to the binlog
  3. On B, do the ALTER(s), etc.
  4. On B, turn on writing to the binlog
  5. Wait for replication to catch up. (It was probably blocked by the ALTER.)
  6. Flip -- that is stop talking to A and start talking to B.
  7. On A, (steps 2,3,4,5) binlog off, ALTER, binlog on, catch up

Alternatives

In Galera, this is "RSU" (as opposed to "TOI").

See also pt-online-schema-change -- this tool (percona.com) allows for doing time consuming maintenance on a table with essentially no downtime.

5.7 can now do many ALTERs with ALGORITHM=INPLACE, hence little or no downtime.

Putting the burden on the Code

Keep in mind that some code changes involve schema changes. Sometimes they can be done in a "backward compatible" manner, thereby obviating stuff like this.

In the extreme, you might need to double up the code. That is, have it capable of handing the old schema and the new schema at the same time. This complicates the code, but eliminates coordination problems between code and schema. (The code should eventually be cleaned up to throw out the 'old' variant.)

That last way becomes especially important when you have Sharding, multiple clients, or other cases of "lots of servers to update".