MySQL Replication – Handling ‘Duplicated Entry for PRIMARY Key’ Error

MySQLmysqldumpreplication

Could you please help me to understand why I'm receiving 'Duplicated entry for PRIMARY key' on a slave server after a full re-sync.

Basically 'mysqldump' was running almost whole night and then the restore process took a couple of hours so when I've started the slave it was ~63874 seconds behind master.

The slave server is read only (read_only) and there was no any writes during re-sync process so I don't understand why there are duplicated keys.

Binary log format is set to MIXED on master.

Command used to backup DB:

mysqldump --opt --single-transaction -Q --master-data=2 db | bzip2 -cz > db.sql.bz2

The slave is replicating only one database from master (db -> db_backup) with the following options:

replicate-wild-do-table = db_backup.%
replicate-rewrite-db = db->db_backup

Best Answer

ASPECT #1 : Replication

I don't think that

replicate-wild-do-table = db_backup.%
replicate-rewrite-db = db->db_backup

belong together.

Other people have wondered about this as well

Jun 14, 2012 : Configured MySQL replication but its not working
Jun 13, 2012 : What is the difference between 'replicate-rewrite-db' and 'replicate-do-db ' during mysql replication ?
May 31, 2012 : Mysql replicate-rewrite-db not working

The problem stems from the order replication rules are processed. According to the MySQL Documentation on Replication Rules:

If any --replicate-rewrite-db options were specified, they are applied before the --replicate-* filtering rules are tested.

Even the MySQL Documentation on replicate-rewrite-db says:

The database name translation is done before the --replicate-* rules are tested.

The replicate-wild-do-table is enforced after the rewrite. It would not be surprising if this ordering somehow imposed an INSERT into a table that has data already.

You are probably asking how did the data get there ?

ASPECT #2 : mysqldump

Doing mysqldump --single-transaction would seem to be the greatest way to point-in-time dumps of data. Unfortunately, mysqldump --single-transaction has an Achilles' Heel : ALTER TABLE. If a table is subject to any ALTER TABLE commands, such as a DROP TABLE and CREATE TABLE, that can break the integrity of the transaction the mysqldump was trying to do the dump in. Truncating a table (which is DDL in the MySQL Universe) and dropping and adding indexes can also be as disruptive.

You can find more information on that from MySQL Performance Blog's Best kept MySQLDump Secret. I actually addressed this point in a past question describing 12 commands that can break the integrity of a mysqldump's transaction : MySQL backup InnoDB

CAVEAT

Personally, I would switch binary-format to STATEMENT
Like I said in MySQL backup InnoDB, you should have another slave (without any replication filters) dedicated to doing mysqldumps.
63000+ seconds behind? You should look into doing parallel table dumps : How can I optimize a mysqldump of a large database?

EPILOGUE

One or both of the aspects may have contributed to letting a row slip in during the mysqldump that should not have existed due to either the rewrite rules or the isolation of the mysqldump being overridden.

SUGGESTIONS

I would do a mysqlbinlog dump of all the relay logs since the start of the mysqldump to see all INSERTs that the Slave will process and see if those rows already exist on the Slave. If they do, you could probably do two things:

1 : Skip all the Duplicate Key errors

Simply add this to my.cnf on the Slave

[mysqld]
slave-skip-errors=1062
skip-slave-start

and restart mysql. Then, run START SLAVE;

all the duplicate-key errors will get bypassed. When Seconds_Behind_Master gets to 0, remove those lines and restart mysql.

2 : Download percona tools

The tools you need are

Use these to find the differences in the Slave, and then correct them

OBSERVATION #1

You mentioned Ask developers put all the temporary tables into a separated database

If your developers are using CREATE TEMPORARY TABLE commands to create temporary tables, they need to use CREATE TABLE instead. Here is why:

With MySQL Replication processing a temporary table, this is what occurs

1) Master run CREATE TEMPORARY TABLE
2) Command inserted into binary log
3) Replication copies this over to the Slave's Relay Logs via I/O Thread
4) Slave SQL Thread runs CREATE TEMPORARY TABLE
5) Slave processes data with that temp table

Once in a while, someone may run STOP SLAVE; to run a backup. If STOP SLAVE; is issued just after step 4, the temp created disappears and so does its data. When you run START SLAVE; Replication breaks instantly complaining the table does not exist. This is normal because when a DB Connections terminates deliberately or accidently, all temp tables opened using CREATE TEMPORARY TABLE in the DB session are dropped. Running STOP SLAVE; kill the SQL thread who was holding opening the temp table.

The only workaround for this is to create the table using CREATE TABLE instead of CREATE TEMPORARY TABLE. When run STOP SLAVE;, the temp table you created normally does not disappear.

I have seen this happen maybe 10 times in my DBA career. Fixing it using the binary logs to find out the name of the temp tables, to create those tables using CREATE TABLE, then starting replication up was the only maintenance possible without just brute force copying the master.

OBSERVATION #2

mk-table-sync only works on tables with primary keys and/or unique keys. It works maybe 99% of the time. I have seen instances where the checksum of a table on the master and slave were different. I would run mk-table-sync, there were still differences (Of course, I was doing mk-table-sync in circular replication with 3 masters, which can be a little dangerous. Using it in Master/Slave is far more stable)

OBSERVATION #3

You mentioned There is some unsafe queries. Does it get some problems with MIXED based replication?

It depends. The most popular unsafe query is any UPDATE or DELETE that uses ORDER BY ... LIMIT. With SBR, this could possibly cause MySQL to UPDATE or DELETE rows from a table on the Slave in a different order tham that of the Master. With RBR, I believe the exact changes in a row are more identifiable to UPDATE or DELETE on the Slave.

SOLUTION : Avoid using unsafe queries. Then, you will not worry !!!

OBSERVATION #4

I just read your second link. ROFL !!! I am familiar with the poster of the answer.