MySQL – Understanding Wait State ‘wait/synch/mutex/innodb/lock_mutex’

amazon-rdsMySQLmysql-5.7

We have an application (on Amazon RDS MySQL 5.7; db.m5.24xlarge; 96 cores) that takes in a lot of data, in parallel processors+queues. It mostly goes to one table, with primary keys not spreading over multiple queues. Entries are batched in transactions.

When we increase the amount of processors+queues, there's a tipping point where all wait time is spent on wait/synch/mutex/innodb/lock_mutex, but I can't really find what that means. It even takes CPU cycles, so these are spinlock mutexes?

Performance insights:

enter image description here

I already disabled the deprecated query cache and adaptive hash index, because they are all mutex-governed and don't help our INSERT ON DUPLICATE KEY UPDATE heavy workload.

So, what is the resource governed by wait/synch/mutex/innodb/lock_mutex, and do I have control over it, either with settings or perhaps using a lower transaction isolation level?

Edit:

show engine innodb status shows:

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 10591578
--Thread 47139002062592 has waited at lock0lock.cc line 6415 for 0  seconds the semaphore:
Mutex at 0x2b2be1000058, Mutex LOCK_SYS created lock0lock.cc:454, lock var 1

wait has ended
--Thread 47138574509824 has waited at lock0lock.cc line 6415 for 0  seconds the semaphore:
Mutex at 0x2b2be1000058, Mutex LOCK_SYS created lock0lock.cc:454, lock var 1

wait has ended
--Thread 47136822810368 has waited at lock0lock.cc line 6342 for 0  seconds the semaphore:
Mutex at 0x2b2be1000058, Mutex LOCK_SYS created lock0lock.cc:454, lock var 1

wait has ended
--Thread 47137720436480 has waited at lock0lock.cc line 6342 for 0  seconds the semaphore:
Mutex at 0x2b2be1000058, Mutex LOCK_SYS created lock0lock.cc:454, lock var 1

and many more

Edit: also interesting in the MySQL 8 changelog, it says:

InnoDB: To improve concurrency for operations that require access to
lock queues for table and row resources, the lock system mutex
(lock_sys->mutex) was replaced by sharded latches, and lock queues
were grouped into table and page lock queue shards, with each shard
protected by a dedicated mutex. Previously, the single lock system
mutex protected all lock queues, which was a point of contention on
high-concurrency systems
. The new sharded implementation permits more
granular access to lock queues.

So am I affected by that? I'll just have to perform a 'quick' upgrade…

Best Answer

The sharded lock-sys was introduced in this commit.

They have good commit messages, which explains what lock-sys is (which is the original question I had):

WL#10314 - InnoDB: Lock-sys optimization: sharded lock_sys mutex

The Lock-sys orchestrates access to tables and rows. Each table, and each row, can be thought of as a resource, and a transaction may request access right for a resource. As two transactions operating on a single resource can lead to problems if the two operations conflict with each other, Lock-sys remembers lists of already GRANTED lock requests and checks new requests for conflicts in which case they have to start WAITING for their turn.

Lock-sys stores both GRANTED and WAITING lock requests in lists known as queues. To allow concurrent operations on these queues, we need a mechanism to latch these queues in safe and quick fashion.

In the past a single latch protected access to all of these queues. This scaled poorly, and the managment of queues become a bottleneck. In this WL, we introduce a more granular approach to latching.

So, this wasn't anything you have control over as a user, aside from reducing concurrency.

Test running the system on MySQL 8 shows a marked improvement:

Amazon performance insight showing no wait states

On a 96 core MySQL server, these are 64 queue processors (=64 processes) operating on the same table, but all on different rows (mostly updating). You can see that there is no mutex wait state anymore, aside from CPU and SQL handling, which is good.