MySQL Replication Issues: Duplicate (Primary) Key Error and Problems Reading Relay Log with MYSQLDUMP

MySQL

I have a simple Master to Slave MySQL Replication setup for offline backups, and I am looking for some guidance on ways to investigate two potentially related issues:

  1. An error reading the relay_log on the replication server every time I run MYSQLDUMP. How concerned should I be about this and Is there a way to prevent it? Example error output is provided below. [Edit: This link https://lists.mysql.com/replication/413 would seem to indicate it is expected behaviour.]
  2. An error replicating data manifesting as a duplicate primary key error. This has me very concerned as it stops replication dead, signifies data corruption, and I don't understand the cause. There were 30+ duplicate keys in 2 different tables. The keys are auto-incremented and the code doesn't seem to be doing anything silly (it only inserts records in one place using a simple INSERT INTO statement for both tables).

All the pertinent configuration information follows:

Environment:

Centos 
MySQL 5.5.6
Replicating database of about 100MB when exported as a MySQL logical copy.
Using MyISAM engine;

Notes: Upgrading MySQL and the database tables is on my to-do list (honest).

Master Server Config:

server-id=1 
log-bin= mysql-bin 
binlog-do-db=dbtoreplicate 
relay-log = mysql-relay-bin 
relay-log-index = mysql-relay-bin.index 
expire-logs-days=7 
ssl-ca=/path/ca-cert.pem 
ssl-cert=/path/server-cert.pem
ssl-key=/path/server-key.pem
binlog_format = MIXED

Notes: I strongly suspect relay-log isn't needed as this is the Master; also suspect expire-logs won't delete until the MySQL server restarts; bintologdb specified because there are lots of dbases on there and only the one is needed for replication; MIXED chosen because many of the SQL Statements used by the software were showing up errors warning of corruption.

Slave Server Config:

[mysqld]
server-id=2
replicate-do-db=dbtoreplicate
log-bin=/home/binlogs/mysql-bin
log_bin_index = /home/binlogs/mysql-bin 
relay_log=/home/binlogs/mysql-relay-bin 
log-slave-updates=TRUE
expire_logs_days=7
binlog_format = MIXED    
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock    
symbolic-links=0

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

#SSL for mysql direct connections not used by replication.
[client]
ssl-ca=/path/ca-cert.pem
ssl-cert=/path/client-cert.pem
ssl-key=/path/client-key.pem

Notes: Binary logs are pointing at the /home folder as the system mount only as 50GB.

Cron job used to investigate MYSQLDUMP:

mysqldump -u root --dump-slave --lock-all-tables --opt bdtoreplicate | gzip > /pathto/backups/dbtoreplicate_`date +\%FT\%T`.sql.gz

Notes: I've removed --lock-all-tables from this command now, on the understanding dump-slave stops the slave anyway and like master-data locks the tables anyway. And I thought it might be causing an issue.

Error reported whenever MySQL dump run:

180802 12:00:01 [Note] Error reading relay log event: slave SQL thread was killed
180802 12:00:15 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000002' at position 96745620, relay log '/home/binlogs/mysql-relay-bin.000008' position: 30630699

Notes: This message appears in the mysql error logs consistently every hour when the mysqldump is run. Also appears when the command is run from the command line.

The Problem:

180802 12:39:08 [ERROR] Slave SQL: Error 'Duplicate entry '81759' for key 'PRIMARY'' on query. Default database: 'databaetoreplicate'. Query: 'INSERT INTO changednameoftable SET some_id = '212',active = 1, date_created = NOW()', Error_code: 1062

Notes: This error stops replication. The code doesn't appear to be doing anything silly. There were quite a few duplicates (30+). The error took 5 days to occur. My hunch was that it was caused by MySQLDump locking the database and replication failing to restart at the correct co-ordinates (hence the duplicate keys), but on reflection 39 minutes seems along time to wait before replication discovers a sync error and throws an error.

Similar problems I've reviewed:

Notes: My set-up worked fine for 5 days before showing the error so I don't think it's an initial configuration issue, so resetting/resyncing/restarting wouldn't seem to solve anything; sql_slave_skip_counter is all very well but won't prevent the issue re-appearing.

Advice and thoughts welcomed, especially on discovering how the keys are duplicated.

Best Answer

Revisiting this question some time later to find answers to a similar problem I have an answers for my past self (for mySQL 5.5):

1) Backing up the Slave Replication Server.

Logs such as:

[Note] Error reading relay log event: slave SQL thread was killed
[Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 11111111, relay log '/home/binlogs/mysql-relay-bin.000016' position: 1111111

Are normal when starting and stopping a slave. MySQLDump does this when the --dump-slave option is specified, see https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html#option_mysqldump_dump-slave, which says:

This option causes mysqldump to stop the slave SQL thread before the dump and restart it again after.

Also see https://bugs.mysql.com/bug.php?id=70275.

2) Syncing Master and Slave for Replication.

Syncing a Master and Slave is tricky on a Master which is live, online and constantly being updated. The MariaDB documentation (https://mariadb.com/kb/en/library/setting-up-replication/), and other answers to this question, recommend:

On the master, flush and lock all tables by running FLUSH TABLES WITH READ LOCK.

Keep this session running - exiting it will release the lock.

Get the current position in the binary log by running SHOW MASTER STATUS: SHOW MASTER STATUS;

Record the File and Position details. If binary logging has just been enabled, these will be blank.

Now, with the lock still in place, copy the data from the master to the slave. See Backup, Restore and Import for details on how to do this.

Note for live databases: You just need to make a local copy of the data, you don't need to keep the master locked until the slave has imported the data.

Once the data has been copied, you can release the lock on the master by running UNLOCK TABLES. UNLOCK TABLES;

In short, this means opening two terminals, locking the database with a read only lock in one, and exporting the database with the other.

The MariaDB documents don't cover the backup method used, assuming you're performing a logical MySQL backup - remember this question is exporting mixed INNODB and MyISAM database tables - then use --master-data without --single-transaction to enforce a global read lock.

sudo mysqldump -u root --master-data database> ./backup.sql
gzip ./backup.sql

On the slave, after logging in to the mysql command line, you'll need to

stop slave;
reset slave;
mysql -u root -p databasename < mysqlbackup

The reset slave command (https://dev.mysql.com/doc/refman/5.5/en/reset-slave.html) here is important. The Slave usually deletes the existing relay-bin-000001 files when the Master position is changes - using CHANGE MASTER in the Sql backup - which is good, but keeps mysql-relay-bin.index. In my experience this leads to duplicate key errors which can pop up hours or days after replication has started.