MySQL Slave lag in SHOW SLAVE STATUS does not match SHOW PROCESSLIST

MySQLreplication

I'm trying to figure out why the reporting of slave lag is different in SHOW SLAVE STATUS and SHOW PROCESSLIST, in MySQL 5.5.13.
The only difference to other slaves of this master is that it is replicating over a relatively slow connection, about 10Mb/sec (cross site).

SHOW SLAVE STATUS reports a slave lag of 0, or occasionally, the real slave lag, once every 5-10 requests. (I'm looking at Seconds_Behind_Master)

SHOW PROCESSLIST, shows the slave lag under the system_user's time column (the one that belongs to the replication sql thread), like so:

mysql> show processlist \G
*************************** 1. row ***************************
     Id: 1
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 63953
  State: Waiting for master to send event
   Info: NULL
*************************** 2. row ***************************
     Id: 2
   User: system user
   Host: 
     db: NULL
Command: Connect
   Time: 61077
  State: Slave has read all relay log; waiting for the slave I/O thread to update it
   Info: NULL

Why would these differ? Or rather, why would SHOW SLAVE STATUS lie? The monitoring system is looking at this command, and goes crazy due to "spikes" once in a while.
I know for a fact the right number is what SHOW PROCESSLIST reports, since the slave took a good few days to be initialized, and is catching up slowly.

Best Answer

The "Time" in the SQL thread is (I think) identical to Seconds_behind_master. It is "How long ago did this query start on the Master ".

All other Times are indicate when the query started on the Slave.

Some fluctuation is caused by what it is measuring (the Master's start time).

Sometimes (rarely), I see the value (both places) bouncing between 0 and some large value. I have yet to track this down. I have seen it on 4.0, 4.1, and 5.1. It eventually goes away, and becomes civilized.

There may be cases where no traffic leads to strange values. But I don't have any Master-Slave setups with little enough traffic for me to comment.

Suppose you do ALTER on the Master, and it took 1 hour (3600 seconds). Also, suppose not much else is going on. The ALTER replicates and starts running. Immediately, the Seconds_behind_master will be about 3600. After the ALTER finishes on the Slave (say, 3600 more seconds later), subsequent replication items will execute with (probably) smaller Times. Eventually replication catches up.

UPDATE 2012-04-27 18:00 EDT

Questions from your comment

Can I rename the system user? Or any other properties of the system user? Also, is the system user is dedicated only for replication? Or any other MySQL processes are spawned by System User? I understand that system user cannot be accessed by a client, its an internal process spawned by MySQL.

Answers to Questions from your comment

No, you cannot rename the system user. It is dedicated to handle MySQL Replication only. The only way to manipulate properties of of the system user would be throught the GRANT command issued to create the replication user.

For example, when you setup a replication user, you issue a command like this:

GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* to 'repl'@'%';

When START SLAVE is issued on a Slave, the Master authenticates the DB Thread coming from the Slave and assign one thread on the Master. The thread on the Master will ship binary log entries to the I/O Thread on the Slave. The I/O Thread on the Slave is assigned to system user for handling communcation between Master and Slave. The SQL Thread on the Slave is also assigned to system user for handling intracommunication of local relay log entries to be processed FIFO (Frist In, First Out) by mysqld running on the Slave. No direct access is permitted via the MySQL Client on the Slave except for

STOP SLAVE; (Kills both I/O Thread and SQL Thread)
STOP SLAVE IO_THREAD;
STOP SLAVE SQL_THREAD;
START SLAVE; (Creates both I/O Thread and SQL Thread)
START SLAVE IO_THREAD;
START SLAVE SQL_THREAD;

Of course, you could issue KILL ####; where #### is the process ID of either the I/O Thread or SQL Thread. You would be totally respsonsible for reestablishing replication at the risk of losing the correct log file and position if the KILL command misses any communication because of an unnatural stoppage of a replicaton thread.

MySQL Replication not proceeding

This turned out to not be a mysql issue at all. The network team recently installed a new security device that will block packets on certain rules. A legitimate database write contained a sequence of characters the device deemed nefarious.

The overall connection handshake for replication was able to make it fine but then it just sat there asking the master for the next log entry whose packets were never making it back.

As far as the broken slave was concerned, it was up to date b/c it had executed the most recent event in the relay logs it had received.

Best Answer

Related Solutions

MySQL: Show full processlist has empty Host name. Why

UPDATE 2012-04-27 18:00 EDT

Questions from your comment

Answers to Questions from your comment

MySQL Replication not proceeding

Related Question