Yesterday, our MySQL's response time increased 20 times- select queries which earlier used to take 0.6-0.7 ms took 20ms, and updates even more- 60ms.
The MySQL running threads count which usually stays around 10-15 shot up to 300-400, for a good five hours (before coming down when traffic cooled).
Server version: 5.6.25-log MySQL Community Server (GPL), with 32 core machine, 128GB machine (90 GB buffer pool)
Storage engine is Innodb.
Some output from show engine innodb status captured by ptstalk-
SEMAPHORES
———- OS WAIT ARRAY INFO: reservation count 159291194 OS WAIT ARRAY INFO: signal count 1074930257 Mutex spin waits 1220535470,
rounds 2921695607, OS waits 26779994 RW-shared spins 291539762, rounds
1856674850, OS waits 31197621 RW-excl spins 300274870, rounds
5700270052, OS waits 95339247 Spin rounds per wait: 2.39 mutex, 6.37
RW-shared, 18.98 RW-excl
———— TRANSACTIONS
———— Trx id counter 20746118705 Purge done for trx's n:o < 20746118666 undo n:o < 5 state: running but idle History list length
1154
Also,there are large number of os waits, specifically at
InnoDB dict0dict.cc:2606 os_waits=60628 InnoDB
dict0dict.cc:2606 os_waits=131984 InnoDB dict0dict.cc:2606
os_waits=81342 InnoDB dict0dict.cc:2606 os_waits=230958
Also at:
InnoDB trx0rseg.cc:196 os_waits=491 InnoDB trx0rseg.cc:196
os_waits=458 InnoDB trx0rseg.cc:196 os_waits=489 InnoDB
trx0rseg.cc:196 os_waits=483 InnoDB trx0rseg.cc:196 os_waits=459
Some other information:
1. Query Cache:
| have_query_cache | YES |
| long_query_time | 0.500000 |
| query_alloc_block_size | 8192 |
| query_cache_limit | 10485 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 0 |
| query_cache_type | OFF |
| query_cache_wlock_invalidate | OFF |
| query_prealloc_size | 8192
- Adaptive Hash Index is also ON.
What could the issue be? Please let me know if you need more details.
Vmstats snapshot from that time:
procs -----------memory---------- ---swap-- -----io---- --system-------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
17 0 685588 2254064 310852 23913820 0 0 96 404 0 0 3 1 97 0 0
5 0 685588 2256376 310856 23915576 0 0 6816 14092 53915 142794 18 3 79 0 0
7 0 685588 2256284 310860 23916752 0 0 6288 13136 51433 143163 15 2 82 0 0
7 0 685588 2254044 310860 23917792 0 0 7904 18572 52465 142161 16 2 82 0 0
8 0 685588 2254044 310860 23918800 0 0 6512 18244 51958 146916 15 2 82 0 0
6 0 685584 2252828 310860 23919844 4 0 5444 16820 51750 147625 16 2 82 0 0
5 0 685584 2251820 310860 23920884 0 0 6864 14084 53609 146373 15 2 82 0 0
6 0 685584 2251952 310860 23921996 0 0 6016 13112 52862 147601 15 2 82 0 0
10 0 685584 2250164 310864 23923056 0 0 6096 14468 52406 147109 15 2 82 0 0
7 0 685584 2248936 310864 23924112 0 0 6576 13664 51638 142051 15 2 83 0 0
Best Answer
Query cache off -- good
Large history list -- Innodb stumbling over itself.
Latency suddenly shot up -- ditto
Thread count suddenly shot up -- ditto.
Possible causes:
UPDATE
orALTER
blocking lots of other threads. Or even aSELECT
that was poorly written. If the slowlog was on, you may be able to discover what it was.long_query_time = 0.5
implies that (if the slowlog was turned on) it will have caught the naughty query. And there would be a lot of normally-fast queries in the log, too.max_connections
. I know that sounds like a bad remedy, but when MySQL gets into the state you described, it may take a restart to uncork it. Sharing resources among 300-400 threads means that no one will 'ever' get finished. How many clients? Web server(s)? What is the limit on each? Example: Apache's MaxClients.