MySQL Master – Slave replication stops after 20min with no errors

I'm using MariaDB 10.4 in a standard master slave setup.

Everything works fine for about 20min, I can see changes replicating across but then after about 20min replication just stops.

However, if I restart the slave MySQL instance, replication begins again and catches up.

Restarting the master has no effect on the replication it stays broken.

When in the broken non-replicating state this is the output from the slave and master.

So it looks like it thinks its caught up, but it hasn't, no matter how long I leave it.

I've used netcat to test the connection and I know it can see the master still, plus when I restart MySQL it works right away.

The my.cnf config has not been changed apart from adding the server ID and this is how I configured the slave initally CHANGE MASTER TO MASTER_HOST='dynamicip.xxxxxxx.org', MASTER_PORT = 4302, MASTER_USER='remote_replication', MASTER_PASSWORD='xxxxxxxxxx', MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=328;

The error logs contain no errors also.

Where would I begin to troubleshoot this?

EDIT / UPDATE

So it appears the data does replicate over eventually but it takes a least 50 minutes, and its tiny bits of data I'm testing it with.

I've switched the hostname dymanic.****.org to an IP address in case it was Cloudflare doing something weird but that hasn't fixed it. I also tried skip-host-cache before that.

I've also switched innodb_flush_log_at_trx_commit from 0 to 1.
Traceroute looks fine.
I can see the statements in the binlog file on the master.

Interestingly using STOP SLAVE and START SALVE kicks it back into action for a while before it begins having the issue again.

SHOW PROCESSLIST on the Master shows that the I/O thread is sometimes present and sometimes not, not sure what that means..

Best Answer

The first thing I would do is check the network between the Master and Slave. Why ???

Almost 8 years ago, I wrote this post : Slave I/O thread dies very often

I once had an occasion where I saw a Slave with SHOW SLAVE STATUS\G looking exactly as you have it, but it was not replicating. This is what actually happened:

The firewall on the Slave would become intermittent
The Master could not talk back to the Slave through the I/O thread
SHOW PROCESSLIST on the Master revealed that the I/O thread was no longer there
Slave assumed the Master was OK

What I recommend is this

Start troubleshooting the traceroute on dynamic....org
See if DNS is acting up. If DNS is involved, add skip-name-resolve and skip-host-cache to my.cnf on Master and Slave and restart mysqld

Please let us know what you found in your troubleshooting

Best Answer

Related Solutions

Thesql replication deadlock slave stopped

Mysql – Broken thesql replication on slave server (error 1236)

Related Question