Mysql – Broken Slave MySQL : How To Fix MySQL Replication (BINLOG) error in the Slave

gtidmaster-slave-replicationMySQLmysql-5.5mysqlbinlog

Before we start I can give some info about the versions,

Ubuntu – 14.04, MySQL – 5.5

I have successfully configured the Master-Slave replication with two EC2 Ubuntu instances. My master is in Ubuntu 14 and slave is in Ubuntu 18. The replication was successful and the slave was replicating data for the last 3-4 days. But, suddenly my slave(Ubuntu 18) stopped replication with a query error.

Error Message : Last_SQL_Error: Error 'Duplicate entry '11379183' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO testTable (id,value1,value2,value3,value4,value5,value6) VALUES(22,"5","429438","1592499300","1","8","1")'

Slave_IO_Running: Yes

Slave_SQL_Running: No

I have tried,

STOP SLAVE;

SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;

START SLAVE;

It has helped me for some time. but the replication is broken again.

How can we resolve this kind of replication errors? Is there any automated method for fixing the broken slave?

Is GTID based replication possible with MySQL 5.5? I think GTID is available from MySQL version 5.6?

Best Answer

HOW TO BYPASS ALL DUPLICATE KEY ERRORS

There are two ways to bypass duplicate key errors

METHOD #1 : Configure it in my.cnf

Add this line under the [mysqld] group header in my.cnf

[mysqld]
slave-skip-errors=1062

then restart mysql.

Once replication begins, all duplicate key errors will be bypassed. Once Seconds_Behind_Master is 0, you are all caught up. However, the data on the Slave is out-of-sync with the Master. What do you do next ???

METHOD #2 : Use an Automated Tool

I am sure you don't have the time to run

STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
START SLAVE;

over and over again. Wouldn't it be nice if an automated tool can do that for you ? Well, such a tool exists. It's called pt-slave-restart.

You would run it something like this:

pt-slave-restart --error-numbers 1062

Put this in a crontab to go off every minute.

* * * * * pt-slave-restart --error-numbers 1062

Even better, you can make a server daemon and not use a crontab

pt-slave-restart --error-numbers 1062 --daemonize

and you do not need to restart mysqld.

Either way, pt-slave-restart will skip that error over and over again until Seconds_Behind_Master is 0. When you are done having the Slave get caught up, stop using pt-slave-restart.

However, you still have the same problem as METHOD #1: the data on the Slave is out-of-sync with the Master. What do you do next ???

OK, WHAT DO YOU DO NEXT ???

You must synchronize the data on the Slave with your Master. Wouldn't it be nice if a utility exists that can do that for you ? Well, such a tool exists. In fact, there are two tools. They are called pt-table-checksum and pt-table-sync.

What do they do ???

You use pt-table-checksum to check on MySQL replication integrity. If you encounter any table that have different checksums between Master and Slave, such a table on the Slave is out-of-sync.

You use pt-table-sync to synchronize MySQL table data.

You can run

pt-table-sync --print > sync_file.txt

to see what will change (this is like a dry run of the real thing)

You then run

pt-table-sync --execute

to perform the actual sync.

PLEASE READ THE PERCONA TOOLKIT DOCUMENTATION.

ALWAYS HANDLE PERCONA TOOLKIT RESPONSIBLY !!!

NOTE : Sadly, there is absolutely nothing you can with MySQL 5.5 and GTID.