MySQL replication works but slave keeps reconnecting to master

master-slave-replicationMySQLreplication

I have a master-slave set-up with multiple slaves in different datacenters. One of the slaves stopped working normally today. This was easily resolved by restarting mysqld but ever since (and perhaps before) the Slave_IO_State keeps jumping from "Waiting for master to send event" to "Checking master version" to "Reconnecting after a failed master event read" (the order may be different, but these are the statuses I get from continuously running a SHOW SLAVE STATUS query).

The connection to the master is made via a SSH tunnel, and I'm able to connect to the master from the command line without a problem. Furthermore, replication seems to be working; the slave is up to date. What I don't understand is why it keeps disconnecting/reconnecting. No other errors show.

This is MySQL 5.5.54 running on CentOS 6 boxes.

Resetting the slave doesn't help. How do I get to the root of this issue?

EDIT: When it really fails, as before and it just happened again, the message is "The slave I/O thread stops because SET @master_heartbeat_period on master failed. Error: Lost connection to MySQL server during query".

Best Answer

It turns out my slave's server_id was too long. I was naming these after the slaves' public IP addresses, and this IP was longer than the max value of 4294967295. I should have checked when I decided to use this naming scheme, but a warning from MySQL would also have been nice.

All's well that ends well. I now use the last three decimals of the public IP address (9 digit max).

OPTION #1

Correct the filter in my.cnf to have this

[mysqld]
replicate_wild_do_table = zo_dev_matrix.%

OPTION #2

Change the filter to the database zo_dev_matrix

replicate_do_db = zo_dev_matrix

only if all your queries do no preface the table with the db and you explicit set the current database to zo_dev_matrix.

CAVEAT

With MySQL 5.6 and back, you must restart mysql after changing the filter in my.cnf. Starting with MySQL 5.7, you can create a dynamic replication filter so a mysql restart is not necessary.

MySQL Replication: How to restore master FROM slave

1) Backups:

You can verify if your backup is successful by tailing the mysqldump for "dump completed successfully" to verify the success of a mysqldump.
Invalid backup is as-good-as (or as-bad-as) no backup. It's a good idea employ a backup-validation process by restoring it to a provisioned location as possible. Do it.
If you want to stick to logical backup, try using with myduper/myloader, which will be quicker than mysqldump.
Again physical backups are faster than logical and this is specifically true for large sizes. If you can giveup on MySQLDump which is slow for restores and especially if you have all-innodb, you should consider using xtrabackup from Percona (for hotbackup). Check installing and configuring xtrabackup with holland framework.

2) Restoring master from slave.

As such there are few questions and articles available but i will quickly notedown steps I can makeout for restoring master from slave.

Considering your master is M and slave is S. (M --> S)
M is down.
S is promoted to Master ( readonly=false, binlogging is enabled)
M is later fixed and ready to join but need a restore.
Restore M from latest backup, mark it as readonly=true.
Upon restore, make it slave of S. (S --> M)
Once it catches up, you may want to failback to original master.
Stop production traffic, Mark M as readonly=false, S as readonly=true.
Mark S as slave of M (should be simple change master to from present binlog of M)
Thus making them active-passive master-master pair.
Move traffic to M.
You might want to break the master-master but it's fine to keep active-passive.
Making sure nothing writes to Passive master.

Hope this helps.

Best Answer

Related Solutions

MySQL Slave not updated

OPTION #1

OPTION #2

CAVEAT

MySQL Replication: How to restore master FROM slave

Related Question