MySQL how to change a slave to a new master

MySQLreplication

I have a MySQL active master (lets call it Alpha), a passive master (called Beta), and a slave (called Gamma).

Topology before change:

enter image description here

I am going to switch the active master and the passive master.

I will then need to also change the slave (Gamma) to the new active master (Beta). I am following the instructions in the book "Effective MySQL: Replication Techniques in Depth", page 97, which states before slave (Gamma) is pointed to the new active master (Beta) I should run:

beta> STOP SLAVE SQL_THREAD;
beta> SHOW SLAVE STATUS\G
beta> SHOW MASTER STATUS\G
beta> START SLAVE SQL_THREAD;

It says use the log position from that SHOW SLAVE STATUS on Beta, and use it to run on Gamma:

gamma> START SLAVE UNTIL MASTER_LOG_FILE='{file}', MASTER_LOG_POS={position}

My question is why is that step necessary. I am planning to stop all application writes to the current active master (Alpha), wait as long as necessary for Beta and Gamma replication to fully catch up.

Since they are both caught up with master (Alpha), is that command "START SLAVE UNTIL…" really needed?

Best Answer

You're correct. If the slaves are in sync to the same point on Alpha, START SLAVE UNTIL would serve no purpose.

Effectively, they're having you do is stop the slave on Gamma so that it is at the same point in Alpha's binlog as Beta. In a live, high traffic condition, that can be slightly trickier to do, so I assume that's their motivation for suggesting that approach.

If Gamma and Beta have identical Relay_Master_Log_File and Exec_Master_Log_Pos values, then they are at the same point relative to Alpha, and you can safely switch Gamma to Beta at Beta's current master log file and position.

However... to state the (presumably) obvious, you can't be doing any writing on Beta (other than via replication) around the time you make the cut, because if you're writing to Beta, then Gamma would miss any events that hadn't replicated from Beta to Gamma via Alpha... but the START SLAVE UNTIL wouldn't help you with that, either.

This means you need to actually verify that all three servers are in agreement in all possible directions, so there's nothing outstanding from Beta back to Alpha. Alpha's Exec_Master_Log_Pos (SHOW SLAVE STATUS) needs to match Beta's Position (SHOW MASTER STATUS) and both Gamma and Beta need to match Alpha and each other in the opposite direction (and of course the actual names of the binlogs need to match, not just the positions).

If you want to be super safe and paranoid, verify that everything is in agreement, then issue FLUSH TABLES WITH READ LOCK; on both Alpha and Beta (leaving the connections open to hold that global lock, which is yours once the prompt returns), then verify that everything remains in agreement at the same binlog positions with nothing written to either master server's binlog that the other hasn't already played back. Stop, re-home and restart replication on Gamma, verifying that it is connected and waiting for incoming events. Finally, verify agreement has persisted in each direction between Alpha and Beta once again, then UNLOCK TABLES; on Alpha and Beta.

Note that above, I never stopped the slave on Beta. It's not really necessary if you have a global read lock, since the slave is stalled and binlog coordinates can't advance under that condition, but you could stop it if desired, so long as the coordinates continue to match Alpha.