I have a simple Master to Slave MySQL Replication setup for offline backups, and I am looking for some guidance on ways to investigate two potentially related issues:
- An error reading the
relay_log
on the replication server every time I run MYSQLDUMP. How concerned should I be about this and Is there a way to prevent it? Example error output is provided below. [Edit: This link https://lists.mysql.com/replication/413 would seem to indicate it is expected behaviour.] - An error replicating data manifesting as a duplicate primary key error. This has me very concerned as it stops replication dead, signifies data corruption, and I don't understand the cause. There were 30+ duplicate keys in 2 different tables. The keys are auto-incremented and the code doesn't seem to be doing anything silly (it only inserts records in one place using a simple
INSERT INTO
statement for both tables).
All the pertinent configuration information follows:
Environment:
Centos
MySQL 5.5.6
Replicating database of about 100MB when exported as a MySQL logical copy.
Using MyISAM engine;
Notes: Upgrading MySQL and the database tables is on my to-do list (honest).
Master Server Config:
server-id=1
log-bin= mysql-bin
binlog-do-db=dbtoreplicate
relay-log = mysql-relay-bin
relay-log-index = mysql-relay-bin.index
expire-logs-days=7
ssl-ca=/path/ca-cert.pem
ssl-cert=/path/server-cert.pem
ssl-key=/path/server-key.pem
binlog_format = MIXED
Notes: I strongly suspect relay-log
isn't needed as this is the Master; also suspect expire-logs
won't delete until the MySQL server restarts; bintologdb
specified because there are lots of dbases on there and only the one is needed for replication; MIXED
chosen because many of the SQL Statements used by the software were showing up errors warning of corruption.
Slave Server Config:
[mysqld]
server-id=2
replicate-do-db=dbtoreplicate
log-bin=/home/binlogs/mysql-bin
log_bin_index = /home/binlogs/mysql-bin
relay_log=/home/binlogs/mysql-relay-bin
log-slave-updates=TRUE
expire_logs_days=7
binlog_format = MIXED
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
symbolic-links=0
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
#SSL for mysql direct connections not used by replication.
[client]
ssl-ca=/path/ca-cert.pem
ssl-cert=/path/client-cert.pem
ssl-key=/path/client-key.pem
Notes: Binary logs are pointing at the /home
folder as the system mount only as 50GB.
Cron job used to investigate MYSQLDUMP:
mysqldump -u root --dump-slave --lock-all-tables --opt bdtoreplicate | gzip > /pathto/backups/dbtoreplicate_`date +\%FT\%T`.sql.gz
Notes: I've removed --lock-all-tables
from this command now, on the understanding dump-slave
stops the slave anyway and like master-data
locks the tables anyway. And I thought it might be causing an issue.
Error reported whenever MySQL dump run:
180802 12:00:01 [Note] Error reading relay log event: slave SQL thread was killed
180802 12:00:15 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000002' at position 96745620, relay log '/home/binlogs/mysql-relay-bin.000008' position: 30630699
Notes: This message appears in the mysql error logs consistently every hour when the mysqldump
is run. Also appears when the command is run from the command line.
The Problem:
180802 12:39:08 [ERROR] Slave SQL: Error 'Duplicate entry '81759' for key 'PRIMARY'' on query. Default database: 'databaetoreplicate'. Query: 'INSERT INTO changednameoftable SET some_id = '212',active = 1, date_created = NOW()', Error_code: 1062
Notes: This error stops replication. The code doesn't appear to be doing anything silly. There were quite a few duplicates (30+). The error took 5 days to occur. My hunch was that it was caused by MySQLDump locking the database and replication failing to restart at the correct co-ordinates (hence the duplicate keys), but on reflection 39 minutes seems along time to wait before replication discovers a sync error and throws an error.
Similar problems I've reviewed:
Notes: My set-up worked fine for 5 days before showing the error so I don't think it's an initial configuration issue, so resetting/resyncing/restarting wouldn't seem to solve anything; sql_slave_skip_counter
is all very well but won't prevent the issue re-appearing.
Advice and thoughts welcomed, especially on discovering how the keys are duplicated.
Best Answer
Revisiting this question some time later to find answers to a similar problem I have an answers for my past self (for mySQL 5.5):
1) Backing up the Slave Replication Server.
Logs such as:
Are normal when starting and stopping a slave. MySQLDump does this when the --dump-slave option is specified, see https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html#option_mysqldump_dump-slave, which says:
Also see https://bugs.mysql.com/bug.php?id=70275.
2) Syncing Master and Slave for Replication.
Syncing a Master and Slave is tricky on a Master which is live, online and constantly being updated. The MariaDB documentation (https://mariadb.com/kb/en/library/setting-up-replication/), and other answers to this question, recommend:
In short, this means opening two terminals, locking the database with a read only lock in one, and exporting the database with the other.
The MariaDB documents don't cover the backup method used, assuming you're performing a logical MySQL backup - remember this question is exporting mixed INNODB and MyISAM database tables - then use --master-data without --single-transaction to enforce a global read lock.
On the slave, after logging in to the mysql command line, you'll need to
The reset slave command (https://dev.mysql.com/doc/refman/5.5/en/reset-slave.html) here is important. The Slave usually deletes the existing relay-bin-000001 files when the Master position is changes - using CHANGE MASTER in the Sql backup - which is good, but keeps mysql-relay-bin.index. In my experience this leads to duplicate key errors which can pop up hours or days after replication has started.