MySQL Replication: Seconds Behind Master super high

MySQLreplication

I've set up a slave db server for my production database, but when I checked the show slave status, I noticed a super big number in seconds behind master.

This is the output:

           Slave_IO_State: Waiting for master to send event
              Master_Host: 1.2.3.4
              Master_User: replicator
              Master_Port: 3306
            Connect_Retry: 60
          Master_Log_File: mysql-bin.000173
      Read_Master_Log_Pos: 15909435
           Relay_Log_File: mysqld-relay-bin.000079
            Relay_Log_Pos: 91173356
    Relay_Master_Log_File: mysql-bin.000093
         Slave_IO_Running: Yes
        Slave_SQL_Running: Yes
          Replicate_Do_DB: 
      Replicate_Ignore_DB: 
       Replicate_Do_Table: 
   Replicate_Ignore_Table: 
  Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
               Last_Errno: 0
               Last_Error: 
             Skip_Counter: 0
      Exec_Master_Log_Pos: 91173210
          Relay_Log_Space: 8179978166
          Until_Condition: None
           Until_Log_File: 
            Until_Log_Pos: 0
       Master_SSL_Allowed: No
       Master_SSL_CA_File: 
       Master_SSL_CA_Path: 
          Master_SSL_Cert: 
        Master_SSL_Cipher: 
           Master_SSL_Key: 
    Seconds_Behind_Master: 486330
Master_SSL_Verify_Server_Cert: No
            Last_IO_Errno: 0
            Last_IO_Error: 
           Last_SQL_Errno: 0
           Last_SQL_Error: 
Replicate_Ignore_Server_Ids: 
         Master_Server_Id: 1
1 row in set (0.00 sec)

ERROR: 
No query specified

Then when I run SHOW PROCESSLIST, I see that the time of the thread matches the time indicated in seconds behind:

mysql> SHOW PROCESSLIST;

| 40 | system user |           | NULL | Connect |  66530 | Waiting for master to send event | NULL             |
| 41 | system user |           | NULL | Connect | 486330 | Reading event from the relay log | NULL             |
| 45 | root        | localhost | NULL | Query   |      0 | NULL                             | SHOW PROCESSLIST |

That time is dropping, slowly. Read_Master_Log_Pos, Relay_Log_Pos, Exec_Master_Log_Pos and Relay_Log_Space are changing all the time.

I've also check the time/date and both servers are on sync.

On the Master side:

mysql> SHOW PROCESSLIST;

| 66739 | replicator | 1.2.3.5:52884 | NULL                | Binlog Dump |    65671 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL             

and show slave hosts looks empty…

mysql> SHOW SLAVE HOSTS;
+-----------+------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+------+------+-----------+
|         2 |      | 3306 |         1 |
+-----------+------+------+-----------+
1 row in set (0.00 sec)

mysql> 

So what is actually going on here? Looks like the slave is actually connected and working, but very very slow? Can someone give me some hints on how to do more debug on this? Server is rather idle at 95%.

Best Answer

When you see the Seconds_Behind_Master that high, I look at the following:

Relay_Log_Space: 8179978166

You have 7.6182GB of relay logs to process.

Master_Log_File: mysql-bin.000173
Relay_Master_Log_File: mysql-bin.000093

This tell me that you have read up to mysql-bin.000173, but you are currently processing things from the mysql-bin.000093.

This also tell me you have about 80 binary logs on the Master, each about 100 MB.

The Seconds_Behind_Master is simply the NOW() minus the TIMESTAMP set at mysql-bin.000093 (Relay_Master_Log_File) position 91173210 (Exec_Master_Log_Pos).

As long as Slave_SQL_Thread is Yes, the relay logs get processed

  • Relay_Log_Space will decrease each time a relay log is done
  • Exec_Master_Log_Pos will increase until current relay log is done, then resets to the beginning of the next relay
  • TIMESTAMP keeps increasing, which make Seconds_Behind_Master decrease (NOW() minus the TIMESTAMP set at Relay_Master_Log_File position Exec_Master_Log_Pos)

This is what happens when replication is off for 486330 seconds (5 days 15 hours 5 minutes 29 seconds) and you run start slave;

Look at your SHOW PROCESSLIST;. The IO Thread has been up for 66530 seconds (18 hours 28 minutes 50 seconds). This means someone or something started replication 18 hours 28 minutes 50 seconds ago.

You stated in your question that you have set up replication for the production server. This means that you ran the mysqldump 5 days 15 hours 5 minutes 29 seconds ago and started replicating from the production master 18 hours 28 minutes 50 seconds ago.

If you had setup the Slave the same day you got the mysqldump from the Master the replication load would be a lot less. Notwithstanding, replication is working normally provided Slave_IO_Thread and Slave_SQL_Thread both say Yes.