I have a system which replicates one database to four slave servers. Every now and then, when traffic is high, one or more of the slave servers hits a duplicate insert error and the slave stops running.
When this happens, I have two choices – I can either SET GLOBAL SQL_SLAVE_SKIP_COUNTER
or I can delete the offending row in the slave. Both seem to work and my logic says that given something happened to cause this problem, there is a possibility that the data in the slave is corrupted. Given that this can only happen on INSERTs, by deleting the row I guarantee the slave data will match the master once replication resumes. By skipping, if the data for that row is corrupted in the slave, it will remain corrupted.
Am I missing anything?
Further, given that this happens once every couple of months on two specific tables, is there any reason I shouldn't automate a process that triggers when this error is encountered, deletes that row in the slave and re-starts the slave?
EDIT: MySQL 5.5.29 and statement replication I believe.
Best Answer
You don't have to use any trigger, If you want to safely skip duplicate key errors, please add this to my.cnf on slaves
Then restart mysql
Check if all the slaves have different serverIDs