MySQL Replication Issues: Duplicate (Primary) Key Error and Problems Reading Relay Log with MYSQLDUMP

MySQL

I have a simple Master to Slave MySQL Replication setup for offline backups, and I am looking for some guidance on ways to investigate two potentially related issues:

An error reading the relay_log on the replication server every time I run MYSQLDUMP. How concerned should I be about this and Is there a way to prevent it? Example error output is provided below. [Edit: This link https://lists.mysql.com/replication/413 would seem to indicate it is expected behaviour.]
An error replicating data manifesting as a duplicate primary key error. This has me very concerned as it stops replication dead, signifies data corruption, and I don't understand the cause. There were 30+ duplicate keys in 2 different tables. The keys are auto-incremented and the code doesn't seem to be doing anything silly (it only inserts records in one place using a simple INSERT INTO statement for both tables).

All the pertinent configuration information follows:

Environment:

Centos 
MySQL 5.5.6
Replicating database of about 100MB when exported as a MySQL logical copy.
Using MyISAM engine;

Notes: Upgrading MySQL and the database tables is on my to-do list (honest).

Master Server Config:

server-id=1 
log-bin= mysql-bin 
binlog-do-db=dbtoreplicate 
relay-log = mysql-relay-bin 
relay-log-index = mysql-relay-bin.index 
expire-logs-days=7 
ssl-ca=/path/ca-cert.pem 
ssl-cert=/path/server-cert.pem
ssl-key=/path/server-key.pem
binlog_format = MIXED

Notes: I strongly suspect relay-log isn't needed as this is the Master; also suspect expire-logs won't delete until the MySQL server restarts; bintologdb specified because there are lots of dbases on there and only the one is needed for replication; MIXED chosen because many of the SQL Statements used by the software were showing up errors warning of corruption.

Slave Server Config:

[mysqld]
server-id=2
replicate-do-db=dbtoreplicate
log-bin=/home/binlogs/mysql-bin
log_bin_index = /home/binlogs/mysql-bin 
relay_log=/home/binlogs/mysql-relay-bin 
log-slave-updates=TRUE
expire_logs_days=7
binlog_format = MIXED    
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock    
symbolic-links=0

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

#SSL for mysql direct connections not used by replication.
[client]
ssl-ca=/path/ca-cert.pem
ssl-cert=/path/client-cert.pem
ssl-key=/path/client-key.pem

Notes: Binary logs are pointing at the /home folder as the system mount only as 50GB.

Cron job used to investigate MYSQLDUMP:

mysqldump -u root --dump-slave --lock-all-tables --opt bdtoreplicate | gzip > /pathto/backups/dbtoreplicate_`date +\%FT\%T`.sql.gz

Notes: I've removed --lock-all-tables from this command now, on the understanding dump-slave stops the slave anyway and like master-data locks the tables anyway. And I thought it might be causing an issue.

Error reported whenever MySQL dump run:

180802 12:00:01 [Note] Error reading relay log event: slave SQL thread was killed
180802 12:00:15 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000002' at position 96745620, relay log '/home/binlogs/mysql-relay-bin.000008' position: 30630699

Notes: This message appears in the mysql error logs consistently every hour when the mysqldump is run. Also appears when the command is run from the command line.

The Problem:

180802 12:39:08 [ERROR] Slave SQL: Error 'Duplicate entry '81759' for key 'PRIMARY'' on query. Default database: 'databaetoreplicate'. Query: 'INSERT INTO changednameoftable SET some_id = '212',active = 1, date_created = NOW()', Error_code: 1062

Notes: This error stops replication. The code doesn't appear to be doing anything silly. There were quite a few duplicates (30+). The error took 5 days to occur. My hunch was that it was caused by MySQLDump locking the database and replication failing to restart at the correct co-ordinates (hence the duplicate keys), but on reflection 39 minutes seems along time to wait before replication discovers a sync error and throws an error.

Similar problems I've reviewed:

Notes: My set-up worked fine for 5 days before showing the error so I don't think it's an initial configuration issue, so resetting/resyncing/restarting wouldn't seem to solve anything; sql_slave_skip_counter is all very well but won't prevent the issue re-appearing.

Advice and thoughts welcomed, especially on discovering how the keys are duplicated.

Best Answer

Revisiting this question some time later to find answers to a similar problem I have an answers for my past self (for mySQL 5.5):

1) Backing up the Slave Replication Server.

Logs such as:

[Note] Error reading relay log event: slave SQL thread was killed
[Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 11111111, relay log '/home/binlogs/mysql-relay-bin.000016' position: 1111111

Are normal when starting and stopping a slave. MySQLDump does this when the --dump-slave option is specified, see https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html#option_mysqldump_dump-slave, which says:

This option causes mysqldump to stop the slave SQL thread before the dump and restart it again after.

Also see https://bugs.mysql.com/bug.php?id=70275.

2) Syncing Master and Slave for Replication.

Syncing a Master and Slave is tricky on a Master which is live, online and constantly being updated. The MariaDB documentation (https://mariadb.com/kb/en/library/setting-up-replication/), and other answers to this question, recommend:

On the master, flush and lock all tables by running FLUSH TABLES WITH READ LOCK.

Keep this session running - exiting it will release the lock.

Get the current position in the binary log by running SHOW MASTER STATUS: SHOW MASTER STATUS;

Record the File and Position details. If binary logging has just been enabled, these will be blank.

Now, with the lock still in place, copy the data from the master to the slave. See Backup, Restore and Import for details on how to do this.

Note for live databases: You just need to make a local copy of the data, you don't need to keep the master locked until the slave has imported the data.

Once the data has been copied, you can release the lock on the master by running UNLOCK TABLES. UNLOCK TABLES;

In short, this means opening two terminals, locking the database with a read only lock in one, and exporting the database with the other.

The MariaDB documents don't cover the backup method used, assuming you're performing a logical MySQL backup - remember this question is exporting mixed INNODB and MyISAM database tables - then use --master-data without --single-transaction to enforce a global read lock.

sudo mysqldump -u root --master-data database> ./backup.sql
gzip ./backup.sql

On the slave, after logging in to the mysql command line, you'll need to

stop slave;
reset slave;
mysql -u root -p databasename < mysqlbackup

The reset slave command (https://dev.mysql.com/doc/refman/5.5/en/reset-slave.html) here is important. The Slave usually deletes the existing relay-bin-000001 files when the Master position is changes - using CHANGE MASTER in the Sql backup - which is good, but keeps mysql-relay-bin.index. In my experience this leads to duplicate key errors which can pop up hours or days after replication has started.

Related Solutions

Mysql – Binlog – ‘Race-like’ Condition on Replication

Replication is always serialized to prevent the race-conditions you are speaking about. See this stackoverflow post for my explanation of InnoDB locking. Some important notes:

RBR with row-based replication and read-committed isolation level sets fewer locks than InnoDB would have previously.
We should be moving to a world with RBR anyway, because this will open more doors for parallel replication threads (by working out dependencies between transactions).

To answer your last question, I use RBR in production, and I haven't seen any major performance hit. We do generate a lot more binary log data however, since the events logged contain whole rows.

MySQL master binlog corruption

Surprisingly, that's not gibberish.

That indeed appears at the top of binlogs whenever you do mysqlbinlog to a binary log generated using MySQL 5.1 and MySQL 5.5. You will not see that gibberish in binary logs for MySQL 5.0 and back.

This is why the start point for replication from an empty binary log is

107 for MySQL 5.5
106 for MySQL 5.1
98 for MySQL 5.0 and back

This is good to remember if you do MySQL Replication where the Master if MySQL 5.1 and the slave is MySQL 5.0. This could present a really big headache.

Replication from Master using 5.0 and Slave using 5.1 works fine, not the other way around.(According to MySQL Documentation, it is generally not supported for 3 reasons: 1) Binary Log Format, 2) Row-based Replication, 3) SQL Incompatibility).

Anyway, do a mysqlbinlog on the offending binary log on the master. If the resulting dump produces gibberish in the middle of the dump (which I have seen a couple of times in my DBA career) you may have to skip to position 98 (MySQL 5.0) or 106 (MySQL 5.1) or 107 (MySQL 5.5) of the master's next binary log and start replicating from there (SOB :( you may need to use MAATKIT tools mk-table-checksum and mk-table-sync to reload master changes not on the slave [if you want to be a hero]; even worse, mysqldump the master and reload the slave and start replication totally over [if you don't want to be a hero])

If the mysqlbinlog of the master is completely readable after the top gibberish you saw, it is possible the master's binary log is fine but the relay log on the slave is corrupt (due to transmission/CRC errors). If that's the case, just reload the relay logs by issuing the CHANGE MASTER TO command as follows:

STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST='< master-host ip or DNS >',
MASTER_PORT=3306,
MASTER_USER='< usernmae >',
MASTER_PASSWORD='< password >',
MASTER_LOG_FILE='< MMMM >',
MASTER_LOG_POS=< PPPP >;
START SLAVE;

Where

MMMM is the last file used from the Master that was last processed on the Slave
PPPP is the last position used from the Master that was last processed on the Slave

You can get MMMM and PPPP by doing SHOW SLAVE STATUS\G and using

Relay_Master_Log_File for MMMM
Exec_Master_Log_Pos for PPPP

Try it out and let me know !!!

BTW running CHANGE MASTER TO command erases the slave's current relay logs and starts fresh.

Best Answer

Related Solutions

Mysql – Binlog – ‘Race-like’ Condition on Replication

MySQL master binlog corruption

Related Question