Mysql – Laggy slave OK to have only one system user in processlist

MySQLreplication

In all the examples of a processlist on a mysql slave I've seen, there are two system users. I have a slave that is very far behind the master. When I do show a processlist, there is only one system user (State: "Waiting for master to send event"), the event scheduler, and a trigger user. Is this a symptom, cause, or nothing to do with excessive lag?

Both servers are 5.6.19 on FreeBSD. Topology: master –> slave –> slave in question. Slave IO and SQL threads show as running. Relay_Log_Space is very large.

show slave status\G
*************************** 1. row ***************************
           Slave_IO_State: Queueing master event to the relay log
              Master_Host: 10.02.00.0
              Master_User: repl
              Master_Port: 3306
            Connect_Retry: 60
          Master_Log_File: master-bin.004634
      Read_Master_Log_Pos: 473351239
           Relay_Log_File: slave-relay-bin.004460
            Relay_Log_Pos: 506886175
    Relay_Master_Log_File: master-bin.004515
         Slave_IO_Running: Yes
        Slave_SQL_Running: Yes
          Replicate_Do_DB:
      Replicate_Ignore_DB:
       Replicate_Do_Table:
   Replicate_Ignore_Table:
  Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
               Last_Errno: 0
               Last_Error:
             Skip_Counter: 0
      Exec_Master_Log_Pos: 506886012
          Relay_Log_Space: 62863753813
          Until_Condition: None
           Until_Log_File:
            Until_Log_Pos: 0
       Master_SSL_Allowed: No
       Master_SSL_CA_File:
       Master_SSL_CA_Path:
          Master_SSL_Cert:
        Master_SSL_Cipher:
           Master_SSL_Key:
    Seconds_Behind_Master: 245653
Master_SSL_Verify_Server_Cert: No
            Last_IO_Errno: 0
            Last_IO_Error:
           Last_SQL_Errno: 0
           Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
         Master_Server_Id: 42
              Master_UUID: a36f5f4b-0904-11e4-a130-b8ca3a622758
         Master_Info_File: /var/lib/mysql/master.info
                SQL_Delay: 0
      SQL_Remaining_Delay: NULL
  Slave_SQL_Running_State: closing tables
       Master_Retry_Count: 86400
              Master_Bind:
  Last_IO_Error_Timestamp:
 Last_SQL_Error_Timestamp:
           Master_SSL_Crl:
       Master_SSL_Crlpath:
       Retrieved_Gtid_Set:
        Executed_Gtid_Set:
            Auto_Position: 0
1 row in set (0.01 sec)


show processlist\G
*************************** 1. row ***************************
 Id: 1
User: event_scheduler
Host: localhost
 db: NULL
Command: Daemon
Time: 74
State: Waiting for next activation
Info: NULL
*************************** 2. row ***************************
 Id: 87720
User: system user
Host:
db: NULL
Command: Connect
Time: 8650
State: Waiting for master to send event
Info: NULL
*************************** 3. row ***************************
 Id: 87721
User: routine_user
Host: localhost
 db: db_name
Command: Connect
Time: 246665
State: closing tables
Info: SELECT
    IFNULL(CAST(LEFT(CONCAT('0', charValue), 10) as decimal(18,9)), 0)
INTO
    converted_

Best Answer

Having replication with only one system user sounds a lot like MySQL 4.1 replication. It was single threaded. That does not sound anything like MySQL 5.x replication.

When you run SHOW SLAVE STATUS\G, you will see something like this:

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 10.48.20.253
                Master_User: replicant
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000262
        Read_Master_Log_Pos: 803779735
             Relay_Log_File: relay-bin.067799
              Relay_Log_Pos: 402744
      Relay_Master_Log_File: mysql-bin.000262
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 803779735
            Relay_Log_Space: 402598
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

Please note the two lines

           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes

When you see Yes to both of these lines, there must exist two process IDs with system user as the user. If you only see one, then either the IO thread (network connection back to the Master) is gone, or the SQL thread (load thread to read relay logs) encountered a MySQL error.

Thread States for Replication

  • IO is Yes and SQL is Yes : Replication should be running
  • IO is No and SQL is Yes : Network Connection to Master is gone
  • IO is Yes and SQL is No : SQL Error reading from Relay Logs
  • IO is No and SQL is No
    • Someone or Something ran STOP SLAVE;
    • MySQL started up with skip-slave-start configured

If both Slave_IO_Running and Slave_SQL_Running are Yes and Relay_Log_Space keeps on increasing, then my guess would be that LOAD DATA INFILE was executed on the Master and all the data is being shipped through the relay logs as a single unit (See my post MySQL 5.6 showing wrong "second_behind_master").

The only other thing that can cause

  • Slave_IO_Running to be Yes
  • Slave_SQL_Running to be Yes
  • Relay_Log_Space to be be increasing

Would be a very heavy write transaction with row-based replication executed as a group commit.

UPDATE 2014-11-17 17:30 EST

Based on the chat room conversation, we have discussed

  • Over 2TB of InnoDB data, so setting up replication from scratch is out of the question
  • Discovered that binary logging is enabled on the Slave
  • I proposed that binary logging on the Slave maybe causing the lag
    • I recent encountered this with a client
    • They lag one extra second every two seconds
    • I disabled binary logging on the Slave and it worked
  • My suggestion for this post is to disable binary logging on the Slave like this
    • STOP SLAVE;
    • SET GLOBAL innodb_fast_shutdown = 0;
    • vi /etc/my.cnf
    • Comment out log-bin=master-bin
    • service mysql restart
    • START SLAVE;