I've got a vanilla MySQL replication, using "MIXED" mode.
The mysql slave lag calculation behaves very strangely – one moment it is 0
and the one after it it's 3630
seconds (or a similar number), then back to 0
, and so on and so forth. Obviously, something is wrong in the replication configuration, since MySQL computes the lag based on the timestamps in the relay log.
I've tried checking the time of the servers, which is identical (using SELECT NOW()
in MySQL). I've also checked the timezone (SELECT @@system_time_zone
) which is set to CDT on both master and slave.
What else can I verify to make sure this issue is resolved? Did anyone else encounter this problem?
Best Answer
MySQL Replication performs weird calculations for Seconds_Behind_Master based on a few things.
Here is a SHOW SLAVE STATUS\G and how to identify the Last Executed SQL from the Master
Notice the following fields in SHOW SLAVE STATUS\G
To calculate Seconds_Behind_Master, it would essentially figured out
This number can intermittently grow because the I/O thread can collect more entries from the Master and loading it into the last relay log while processing the SQL statement located at Master LogFile 'Relay_Master_Log_File' and Master LogPos 'Exec_Master_Log_Pos'. This increasing number is also manifested by three(3) variables changing: Relay_Log_Space, Master_Log_File, Read_Master_Log_Pos. If Seconds_Behind_Master is 0, this is good indication that Read_Master_Log_Pos and Exec_Master_Log_Pos virtually the same. Also worth noting is the fact that long running SQL can fool you into thinking that replication is behind. Once that SQL statement is complete and no other statements have sufficiently backlogged, Seconds_Behind_Master can drop dramatically, even to 0.
So, why would Seconds_Behind_Master bounce back and forth between 0 and an increasing number. It is possible for network latency in transmitting SQL through the I/O thread to contribute to miscalculation because the needed
TIMESTAMP
variable was not reached yet in the back of the last relay log (remember this is asynchronous replication). To verify this, runmysqlbinlog
against any binary log or relay log and see where theTIMESTAMP
variables are written between statements.This behavior was first addressed June 22, 2007 and the bug was supposedly fixed then.
Here is one sure way to take MySQL Replication and screw its head on correctly (For this example, use the same SHOW SLAVE STATUS\G)
Using Relay_Master_Log_File mysql-bin.002814 and Exec_Master_Log_Pos 823078734 as the Log FilePosition to Restart From, run these commands
These steps should 1) kill both I/O and SQL Threads, 2) wipe out All Collected Relay Logs, 3) start with a New Relay Log, and 4) establish New I/O and SQL Threads.
Replication should be good to go from here. As to whether or not this bug can truly be addressed, this now in Oracle's hands (That last statement does not inspire any confidence, does it ???)