MySQL replication works but slave keeps reconnecting to master

master-slave-replicationMySQLreplication

I have a master-slave set-up with multiple slaves in different datacenters. One of the slaves stopped working normally today. This was easily resolved by restarting mysqld but ever since (and perhaps before) the Slave_IO_State keeps jumping from "Waiting for master to send event" to "Checking master version" to "Reconnecting after a failed master event read" (the order may be different, but these are the statuses I get from continuously running a SHOW SLAVE STATUS query).

The connection to the master is made via a SSH tunnel, and I'm able to connect to the master from the command line without a problem. Furthermore, replication seems to be working; the slave is up to date. What I don't understand is why it keeps disconnecting/reconnecting. No other errors show.

This is MySQL 5.5.54 running on CentOS 6 boxes.

Resetting the slave doesn't help. How do I get to the root of this issue?

EDIT: When it really fails, as before and it just happened again, the message is "The slave I/O thread stops because SET @master_heartbeat_period on master failed. Error: Lost connection to MySQL server during query".

Best Answer

It turns out my slave's server_id was too long. I was naming these after the slaves' public IP addresses, and this IP was longer than the max value of 4294967295. I should have checked when I decided to use this naming scheme, but a warning from MySQL would also have been nice.

All's well that ends well. I now use the last three decimals of the public IP address (9 digit max).