MySQL Slave lag in SHOW SLAVE STATUS does not match SHOW PROCESSLIST

MySQLreplication

I'm trying to figure out why the reporting of slave lag is different in SHOW SLAVE STATUS and SHOW PROCESSLIST, in MySQL 5.5.13.
The only difference to other slaves of this master is that it is replicating over a relatively slow connection, about 10Mb/sec (cross site).

SHOW SLAVE STATUS reports a slave lag of 0, or occasionally, the real slave lag, once every 5-10 requests. (I'm looking at Seconds_Behind_Master)

SHOW PROCESSLIST, shows the slave lag under the system_user's time column (the one that belongs to the replication sql thread), like so:

mysql> show processlist \G
*************************** 1. row ***************************
     Id: 1
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 63953
  State: Waiting for master to send event
   Info: NULL
*************************** 2. row ***************************
     Id: 2
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 61077
  State: Slave has read all relay log; waiting for the slave I/O thread to update it
   Info: NULL

Why would these differ? Or rather, why would SHOW SLAVE STATUS lie? The monitoring system is looking at this command, and goes crazy due to "spikes" once in a while.
I know for a fact the right number is what SHOW PROCESSLIST reports, since the slave took a good few days to be initialized, and is catching up slowly.

Best Answer

The "Time" in the SQL thread is (I think) identical to Seconds_behind_master. It is "How long ago did this query start on the Master ".

All other Times are indicate when the query started on the Slave.

Some fluctuation is caused by what it is measuring (the Master's start time).

Sometimes (rarely), I see the value (both places) bouncing between 0 and some large value. I have yet to track this down. I have seen it on 4.0, 4.1, and 5.1. It eventually goes away, and becomes civilized.

There may be cases where no traffic leads to strange values. But I don't have any Master-Slave setups with little enough traffic for me to comment.

Suppose you do ALTER on the Master, and it took 1 hour (3600 seconds). Also, suppose not much else is going on. The ALTER replicates and starts running. Immediately, the Seconds_behind_master will be about 3600. After the ALTER finishes on the Slave (say, 3600 more seconds later), subsequent replication items will execute with (probably) smaller Times. Eventually replication catches up.