MySQL Replication – How to Handle Master to Master Replication After Disk Space Outage

We had master to master replication with two node nodes A & B, both are in virtual environment. Initially there was an outage (diskspace issue) in node A and stopped replication. Application level traffic has been diverted to node B and relocated the node A data store and made available of space.

Node A has started successfully & up. Coming to the replication has started and got an error in node B while syncing with node A and bin log in node B got corrupted. The root cause is still a mystery. But analyzing the logs could found some duplicate entries and below is the error log info :

[ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236).
[ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position', Error_code: 1236
111014 20:25:48 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001067', position 183468345.

How do we handle the replication in this situation? Can we skip the current bin-log and start the replication with the next available bin-log and its position. Is it a good idea to sync node B with node A:

CHANGE MASTER TO
  MASTER_HOST='XX.XX.XXX.XXX',
  MASTER_USER='replicate',
  MASTER_PASSWORD='slave',
  MASTER_PORT=3306,
  MASTER_LOG_FILE='mysql-bin.001025',
  MASTER_LOG_POS=4,
  MASTER_CONNECT_RETRY=10;

The main reason I am posing this scenario is, that I don't want to restore the DB and build it from scratch, where DB backup is around 80 GB in node B. How can I repair the replication?

CHANGE MASTER TO MASTER_HOST='XX.XX.XXX.XXX', MASTER_USER='replicate', MASTER_PASSWORD='slave', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.001068', MASTER_LOG_POS=NewPos, MASTER_CONNECT_RETRY=10; START SLAVE;

Best Answer

There are two things that need to be mentioned here:

1) The word impossible is a dead giveaway. Client requested master to start replication from impossible position. Here is what this essentially means:

It just so happens that the file size of a binary log and the position of the binary logs are one and the same. The slave wants to read from mysql-bin.001067 position 183468345. Since the word 'impossible' comes up in the message, this indicates that the master binary log mysql-bin.001067 is less than 183468345 bytes. To get replication going again, skip to the next binary log:

NewPos is dependent on the version of MySQL.

MySQL 5.6 : NewPos = 120
MySQL 5.5 : NewPos = 107
MySQL 5.1 : NewPos = 106
MySQL 5.0 : NewPos = 98
MySQL 4.x : NewPos = 98

2) You could just look into using data synchronization tools from Percona.

I have used these tools for about 2 years and they help you hunt down differences in tables between master and slave, even if the table on the master is InnoDB and the same table on the slave is MyISAM (provided the tables have the same table structure).

Replication must be on while running these tools.

BTW Percona has a new set of tools called the Percona Toolkit. They forked away from their own MAATKIT tools to make better ones. The tools are probably called pt-table-checksum and pt-table-sync.

Best Answer

Related Solutions

Mysql – Broken thesql replication on slave server (error 1236)

Mysql – Max allowed packet changed still replication not starting

Related Question