Mysql – 1236 On Slave After Master Master Failure

multi-masterMySQLmysql-5.6replication

I have a master (master 1) that replicates to another master (master 2), which then replicates to its slave.

So master 2 had an issue with the binary logs:

Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

Which I've run into before and was able to resolve by running the same commands I ran from this thread, expire_logs_days directive requires change master? (I'll ask a question later about why this is still occuring, not my current issue though).

If interested here's the relevant bit from that thread for how I got it working:

stop slave;
reset slave;
change master to master_log_file='...' , master_log_pos=...
start slave;

so now master 2 is good and replicating from master 1. This time though the slave of master 2 broke as well.

The error I'm getting is:

Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size; the first event 'mysqld-bin.000397' at 244145356, the last event read from './mysqld-bin.000397' at 4, the last byte read from './mysqld-bin.000397' at 4.'

When I run SHOW BINARY LOGS; on the slave I get:

ERROR 1381 (HY000): You are not using binary logging

What happened to my slave? Why's it not working with the logs anymore? The master still has the log file so I figured running the stop, reset, change, start would resolve the issue (because it would re-request the logs) but it didn't.

Best Answer

I'll bet sync_binlog was turned off.

With it off, the binlog entries just before the crash may not have been flushed to the binlog file, even though they have been sent to the Slave.

Sending replication data from Master to Slave:

  1. Write to the table on the Master.
  2. Buffer up the write to the binlog.
  3. Optionally flush the buffer to the binlog. -- Controlled by sync_binlog
  4. Send the query to the Slave(s).

When sync_binlog=OFF, there is a big chance that the binlog will be shorter than what the slave thinks it should be.

When the Slave-Master connection is reestablished, the Slave picks up where it left off. With sync_binlog=ON, that would be at the exact end of some binlog, and it would decide to move to the next binlog. The manual CHANGE MASTER simulates that.

The CHANGE MASTER to position 0 (or 4) of the next binlog (bump the number by 1).

(I have never used RESET SLAVE; I see not reason for it.)