Mysql – OOM in thesql-percona

innodbmemoryMySQLpercona

Every 4th day I run in to OOM problem.

[Sat Dec 27 12:20:21 2014] Out of memory: Kill process 3662 (mysqld) score 990 or sacrifice child
[Sat Dec 27 12:20:21 2014] Killed process 3662 (mysqld) total-vm:267580568kB, anon-rss:249094644kB, file-rss:0kB

When I check the mysql err log I get this. Then it gets restarted automatically, at the time of restart the load avg is 60.

InnoDB: Warning: a long semaphore wait:
--Thread 140212527941376 has waited at dict0dict.cc line 1027 for 248.00 seconds the semaphore:
Mutex at 0x17556e048 '&dict_sys->mutex', lock var 1
waiters flag 1
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
2014-12-27 12:07:22 7f8677fff700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 17 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 398108 srv_active, 0 srv_shutdown, 91 srv_idle
srv_master_thread log flush and writes: 398198
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 5067811
--Thread 140215609943808 has waited at srv0srv.cc line 2605 for 210.00 seconds the semaphore:
X-lock on RW-latch at 0x1366220 '&dict_operation_lock'
a writer (thread id 140215473645312) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0undo.cc line 298
Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.1/storage/innobase/dict/dict0stats.cc line
2385
--Thread 140212556961536 has waited at dict0dict.cc line 1027 for 169.00 seconds the semaphore:
Mutex at 0x17556e048 '&dict_sys->mutex', lock var 1
waiters flag 1
--Thread 140212527941376 has waited at dict0dict.cc line 1027 for 269.00 seconds the semaphore:
Mutex at 0x17556e048 '&dict_sys->mutex', lock var 1
waiters flag 1
--Thread 140212532467456 has waited at dict0dict.cc line 1027 for 240.00 seconds the semaphore:
Mutex at 0x17556e048 '&dict_sys->mutex', lock var 1
waiters flag 1
--Thread 140215449335552 has waited at dict0dict.cc line 1027 for 58.000 seconds the semaphore:
Mutex at 0x17556e048 '&dict_sys->mutex', lock var 1
waiters flag 1
OS WAIT ARRAY INFO: signal count 1139329316
Mutex spin waits 275600240, rounds 166051177, OS waits 1327651
RW-shared spins 569536954, rounds 1645302082, OS waits 2312389
RW-excl spins 122854873, rounds 1740073086, OS waits 910606
Spin rounds per wait: 0.60 mutex, 2.89 RW-shared, 14.16 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 1800673801
Purge done for trx's n:o < 1800671398 undo n:o < 0 state: running but idle
History list length 1002
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 3549436, OS thread handle 0x7f85c58a4700, query id 1264087983 ec2-175-41-139-224.ap-southeast-1.compute.amazonaws.com 175.41.139.224 readonly init
`
...
.....

RECORD LOCKS space id 2 page no 100 n bits 184 index "PRIMARY" of table "mysql"."innodb_index_stats" trx id 1800673786 lock_mode X
---Killed
141227 12:20:29 mysqld_safe Number of processes running now: 0
141227 12:20:29 mysqld_safe mysqld restarted

Below is my.cnf

[mysql]

# CLIENT #
port                           = 3306
socket                         = /dbvol/mysql/var/run/mysqld/mysqld.sock

[mysqld]

# GENERAL #
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /dbvol/mysql/var/run/mysqld/mysqld.sock
pid-file                       = /dbvol/mysql/var/run/mysqld/mysqld.pid

# MyISAM #
key-buffer-size                = 1G
myisam-recover                 = FORCE,BACKUP

# SAFETY #
max-allowed-packet             = 512M
max-connect-errors             = 1000000

# DATA STORAGE #
datadir                        = /dbvol/mysql/lib/mysql/

# BINARY LOGGING #
binlog-format              = MIXED
log-bin                        = /dbvol/mysql/lib/mysql/mysql-bin
expire-logs-days               = 21
sync-binlog                    = 1

# REPLICATION #

read-only                      = 1
skip-slave-start               = 1
log-slave-updates              = 0
relay-log                      = /dbvol/mysql/lib/mysql/relay-bin
slave-net-timeout              = 60
sync-master-info               = 10000
#sync-relay-log                = 10000
sync-relay-log-info            = 10000
server-id                      = 21300

# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 1
query-cache-size               = 1024M
max-connections                = 1200
thread-cache-size              = 600
open-files-limit               = 65535
table-definition-cache         = 4096
table-open-cache-instances     = 16
table-open-cache               = 1500

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 512M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 220G

# LOGGING #
log-error                      = /dbvol/mysql/log/mysql-error.log
log-queries-not-using-indexes  = 1
slow-query-log                 = 1
slow-query-log-file            = /dbvol/mysql/log/mysql-slow.log
long_query_time                = 5
slow_query_log_use_global_control = log_slow_rate_limit
log_slow_rate_limit = 100

Server Capacity:

CPU : 32
MEM : 240G
SSD: 3TB

Server version: 5.6.21-70.1-log Percona Server (GPL), Release 70.1, Revision 698

Any idea to fix this? Is my buffer_pool_size is too huge in my.cnf?

Best Answer

I would suggest that even though your innodb_buffer_pool_size is undoubtedly too large for the available memory (192GB would be the theoretical max "sane" value) your thread-cache-size is likely to be what's occasionally pushing you over the edge.

There's no rational justification for such a large value, particularly with max connections limited to 1200.

I will speculate that shrinking the buffer pool size may lengthen the interval between oom events, and you definitely need to do that... but reducing the thread cache size value may be necessary to eliminate oom events altogether.

Individual client threads can have essentially unbounded memory growth, depending on what's needed to execute any particular query. (That's why the memory usage calculation formulas you'll see that multiply by max connections are essentially worthless for calculating meaningful numbers -- they only give you the least-worst-case scenario).

With the thread cache at 600, you are leaving a door open for a hundreds of memory-intensive queries to leave behind a cached thread holding on to memory that may not be cleaned up for a long time, if ever. It's not technically a "leak," but it behaves similarly.

Decrease this variable below the point where the SHOW STATUS LIKE 'threads_created'; never reaches the point where it stops growing somewhat consistently during moderate to high traffic periods... unless you have an incredibly bursty demand for the number of concurrent connections, you only need a very small number of threads in the thread cache.

The only circumstance I recall where I needed this value to be larger was an application that, on start/restart, attempted to make 48 new client connections in parallel at easentially exactly the same time. (Yes, they had their reasons). Without some number of cached threads available, the server couldn't reliably establish this many new connections so quickly.

And that's what the thread cache does -- reduces the resource cost of new connections to the server, at the expense of maintaining the old ones.

Note, the query cache is also probably too large for good performance. It's essentially unrelated to the oom issue since it doesn't grow, but it's still excessive, since the qcache is a choke point for every query, and a larger cache takes more processing time to manage.

Related Solutions

Mysql – How to debug a db memory-leak causing thesql to go before it’s own limits

...even surpassing it's theorically maximum possible allocation.

[OK] Maximum possible memory usage: 7.3G (46% of installed RAM)

There is not actually a way to calculate maximum possible memory usage for MySQL, because there is no cap on the memory it can request from the system.

The calculation done by mysqltuner.pl is only an estimate, based on a formula that doesn't take into account all possible variables, because if all possible variables were taken into account, the answer would always be "infinite." It's unfortunate that it's labeled this way.

Here is my theory on what's contributing to your excessive memory usage:

thread_cache_size       = 128

Given that your max_connections is set to 200, the value of 128 for thread_cache_size seems far too high. Here's what makes me think this might be contributing to your problem:

When a thread is no longer needed, the memory allocated to it is released and returned to the system unless the thread goes back into the thread cache. In that case, the memory remains allocated.

^{http://dev.mysql.com/doc/refman/5.6/en/memory-use.html}

If your workload causes even an occasional client thread to require a large amount of memory, those threads may be holding onto that memory, then going back to the pool and sitting around, continuing to hold on to memory they don't technically "need" any more, on the premise that holding on to the memory is less costly than releasing it if you're likely to need it again.

I think it's worth a try to do the following, after first making a note of how much memory MySQL is using at the moment.

Note how many threads are currently cached:

mysql> show status like 'Threads_cached';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Threads_cached | 9     |
+----------------+-------+
1 row in set (0.00 sec)

Next, disable the thread cache.

mysql> SET GLOBAL thread_cache_size = 0;

This disables the thread cache, but the cached threads will stay in the pool until they're used one more time. Disconnect from the server, then reconnect and repeat.

mysql> show status like 'Threads_cached';

Continue disconnecting, reconnecting, and checking until the counter reaches 0.

Then, see how much memory MySQL is holding.

You may see a decrease, possibly significant, and then again you may not. I tested this on one of my systems, which had 9 threads in the cache. Once those threads had all been cleared out of the cache, the total memory held by MySQL did decrease... not by much, but it does illustrate that threads in the cache do release at least some memory when they are destroyed.

If you see a significant decrease, you may have found your problem. If you don't, then there's one more thing that needs to happen, and how quickly it can happen depends on your environment.

If the theory holds that the other threads -- the ones currently servicing active client connections -- have significant memory allocated to them, either because of recent work in their current client session or because of work requiring a lot of memory that was done by another connection prior to them languishing in the pool, then you won't see all of the potential reduction in memory consumption until those threads are allowed to die and be destroyed. Presumably your application doesn't hold them forever, but how long it will take to know for sure whether there's a difference will depend on whether you have the option of cycling your application (dropping and reconnecting the client threads) or if you'll have to just wait for them to be dropped and reconnected over time on their own.

But... it seems like a worthwhile test. You should not see a substantial performance penalty by setting thread_cache_size to 0. Fortunately, thread_cache_size is a dynamic variable, so you can freely change it with the server running.

Mysql – The total number of locks exceeds the lock table size, even after increasing buffer pool size

Try this in my.cnf file,

[mysqld] innodb_buffer_pool_size = 80M

And you need to restart the Workbench as well as the Sql service.

Best Answer

Related Solutions

Mysql – How to debug a db memory-leak causing thesql to go before it’s own limits

Mysql – The total number of locks exceeds the lock table size, even after increasing buffer pool size

Related Question