MySQL slave stuck at ‘System lock’ most of the time, high CPU usage

linuxMySQLreplication

Let me describe what we are facing now:

We have a MySQL Master-Slave, Row Based Replication setup and at the moment, the slave is stuck with a perpetual System Lock. The Master is an active server with many updates and deletes running, but the slave will just not replicate anything. There are no errors in the log and the slave is reading binary logs fine from the master, but it does nothing on the slave. The Seconds_Behind_Master value keeps increasing. This is what show processlist on slave shows:

mysql> show processlist ;
+----+-------------+-----------+------+---------+-------+----------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time  | State                            | Info             |
+----+-------------+-----------+------+---------+-------+----------------------------------+------------------+
| 10 | system user |           | NULL | Connect |  4985 | Waiting for master to send event | NULL             |
| 11 | system user |           | NULL | Connect | 53715 | System lock                      | NULL             |
| 14 | root        | localhost | NULL | Sleep   |  2958 |                                  | NULL             |
| 16 | root        | localhost | NULL | Query   |     0 | init                             | show processlist |
+----+-------------+-----------+------+---------+-------+----------------------------------+------------------+
4 rows in set (0.00 sec)

And show slave status shows:

SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: System lock

The only sign of life is that Relay_Log_Pos from the slave status is changing values but very slowly. Does it mean that it is running the queries from the binary log, but just that it is too slow?

Pasting some info from show engine innodb status :

LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 14, OS thread handle 0x7f7824698700, query id 216193 localhost root init
show engine innodb status
---TRANSACTION 230426904, not started
mysql tables in use 1638, locked 1638
MySQL thread id 11, OS thread handle 0x7f7824920700, query id 216192 System lock
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)

Also noticed that the mysqld process for this particular mysql instance has high CPU usage. What could be causing the Slave_SQL_Running_State: System lock and blocking the slave from running the queries? There are no disk issues with the machine and there is another MySQL instance running which is a slave to another MySQL instance (myisam) on the same master and that is running fine.

MySQL version : 5.6.20.
OS : RHEL 6.5
We have some tables that are partitioned (if that matters).

Edit: Just noticed that Slave_SQL_Running_State is occasionally changing to 'Reading event from the relay log'. So it looks like the replication is just too slow.

Best Answer

first we need to understand , there is no specific solution for this .

So as per provided information in Question, what good is :

  1. You have mentioned that relay log position is keep on changing which means "SQL thread" is working

  2. You have mentioned that Slave_SQL_Running_State is also getting change which means I/O thread is also working.

    What bad is :"relay log space is increasing" which means data is coming but taking time to execute.

Lets dwell more here :

  1. Are you observing slowness in Master as well ? Any slow Query in master ? If answer is No , proceed towards 2

  2. Is configuration of both server same ? In this you must check mysql buffers pool configuration and isolation method. I have personally experience that Isolation level can also be culprit ( Repeatable read which is default in MySQL ) If same , proceed towards 3

  3. Have you checked what query is executing at time when you observed slowness. Just do , pager grep Query followed by show processlist;

This will give you an idea what is stucking. Lets go more inside , check relay log position and check in relay log what is the query and try to optimize it. But before that be ensure if it is really performing slow .

You can use performance schema database and in this database , just enable instrument for SQL,IO,Replication and take help of this link : https://dev.mysql.com/doc/refman/5.7/en/performance-schema-replication-tables.html

Apart from this , you can also use SYS schema to diagnose whats going wrong.

What Rolando told is one of the cause and he is correct . Load file can generally cause this issue.

Let me know if i make sense :)