I'm trying to figure out why the reporting of slave lag is different in SHOW SLAVE STATUS
and SHOW PROCESSLIST
, in MySQL 5.5.13.
The only difference to other slaves of this master is that it is replicating over a relatively slow connection, about 10Mb/sec (cross site).
SHOW SLAVE STATUS
reports a slave lag of 0, or occasionally, the real slave lag, once every 5-10 requests. (I'm looking at Seconds_Behind_Master
)
SHOW PROCESSLIST
, shows the slave lag under the system_user
's time
column (the one that belongs to the replication sql thread
), like so:
mysql> show processlist \G *************************** 1. row *************************** Id: 1 User: system user Host: db: NULL Command: Connect Time: 63953 State: Waiting for master to send event Info: NULL *************************** 2. row *************************** Id: 2 User: system user Host: db: NULL Command: Connect Time: 61077 State: Slave has read all relay log; waiting for the slave I/O thread to update it Info: NULL
Why would these differ? Or rather, why would SHOW SLAVE STATUS
lie? The monitoring system is looking at this command, and goes crazy due to "spikes" once in a while.
I know for a fact the right number is what SHOW PROCESSLIST
reports, since the slave took a good few days to be initialized, and is catching up slowly.
Best Answer
The "Time" in the SQL thread is (I think) identical to Seconds_behind_master. It is "How long ago did this query start on the Master ".
All other Times are indicate when the query started on the Slave.
Some fluctuation is caused by what it is measuring (the Master's start time).
Sometimes (rarely), I see the value (both places) bouncing between 0 and some large value. I have yet to track this down. I have seen it on 4.0, 4.1, and 5.1. It eventually goes away, and becomes civilized.
There may be cases where no traffic leads to strange values. But I don't have any Master-Slave setups with little enough traffic for me to comment.
Suppose you do ALTER on the Master, and it took 1 hour (3600 seconds). Also, suppose not much else is going on. The ALTER replicates and starts running. Immediately, the Seconds_behind_master will be about 3600. After the ALTER finishes on the Slave (say, 3600 more seconds later), subsequent replication items will execute with (probably) smaller Times. Eventually replication catches up.