MySQL – Failover in Chain Replication

failoverMySQLreplication

We have a 4 server setup, 1 master, 3 daisy-chained slaves, in the following setup:

A (master) -> B (slave) -> C (slave) -> D (slave)

(the servers B and C and D are running with log-slave-updates)

In normal operation everything works as expected: if we add new data to A, we see it show up quickly in B and C and D

Now we want to create a failure scenario — we shutdown A and want to make B the new master:

B (master) -> C (slave) -> D (slave)

It seems like what we want to do is fairly simple — switch B from Slave to Master

We are trying to follow the documentation "Switching Sources During Failover"
https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-switch.html

The doc says " On the replica Replica 1 being promoted to become the source, issue STOP REPLICA | SLAVE and RESET MASTER."

So if we're reading correctly, to switch B from Slave to Master all we have to do is run:

STOP SLAVE
RESET MASTER

Running "STOP SLAVE" causes no issues, but running "RESET MASTER" breaks the replication to downstream staves C and D. This is the error on C:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size'

or sometimes:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'I/O error reading log event; the first event 'mysql-bin.000001' at 14282060, the last event read from './mysql-bin.000001' at 14282683, the last byte read from './mysql-bin.000001' at 14282683.'

So what is the point of "RESET MASTER" and why does it break the chain? Is there any harm in omitting it/how does one properly do a failover in MySQL chain replication?

Best Answer

You're not switching sources so there is nothing required to migrate from an A master to a B master.

To clear up the record of A being a master of B however the following may be desirable:

  • (B) STOP SLAVE will stop (B) from trying to get more binlog events
  • (B) RESET SLAVE ALL will make (B) forget about the (A) master.
  • (B) RESET MASTER is wrong. (B) is the master of (C). It was even before (A) is removed.