Mysql – Percona XtraDB Cluster (5.6.28-76.1-56-log) – random OOM events

memoryMySQLperconaperformance

We have a multi-node (each of 64G RAM) cluster. The main (the one we use for mutating data) node randomly just OOMs, this happens both in peak times and in extremely quiet times.

What we've noticed just before the OOM event is usually a bunch of transactions waiting for locks on the main node. MySQL tuner suggests that the max possible usage of mysql is 59G, but when this starts to buildup it exhausts ALL the resources on the box and OOM event happens. There is nothing else running on that box.

The last occurrence of the event was on a quiet period, where there were very small amount of connections (less than 150), with even less actively doing anything.

Can I get any pointers on what may be causing MySQL to go over what is supposed be its maximum in terms of memory?

our my.cnf (sanitised a bit)

[mysqld]
datadir                         = /var/lib/mysql
long_query_time                 = 5
slow_query_log                  = 1
slow_query_log_file             = /var/log/mysql/slow_query.log
log_bin                         = mysql-bin
log_error                       = /var/log/mysql/error.log
tmpdir                          = /var/lib/mysql/.tmpdir
binlog_format                   = ROW
default_storage_engine          = InnoDB




expire_logs_days                = 28
ft_min_word_len                 = 1
ft_stopword_file                = ''
join_buffer_size                = 16M
key_buffer_size                 = 24M
lc-messages-dir                 = /usr/share/mysql
lower_case_table_names          = 0
max_allowed_packet              = 512M
max_connections                 = 400
max_heap_table_size             = 6G
memlock                         = 0
myisam-recover                  = BACKUP
query_cache_limit               = 4M
query_cache_size                = 64M
query_cache_type                = 1
read_buffer_size                = 2M
read_rnd_buffer_size            = 16M
skip-external-locking
skip_name_resolve
sort_buffer_size                = 10M
sysdate_is_now                  = 1
table_open_cache                = 1024
thread_cache_size               = 512
thread_stack                    = 192K
tmp_table_size                  = 6G

explicit_defaults_for_timestamp = 1


innodb_autoinc_lock_mode        = 2
innodb_buffer_pool_instances    = 8
innodb_buffer_pool_size         = 32G
innodb_doublewrite              = 1
innodb_file_per_table           = 1
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_io_capacity              = 200
innodb_log_buffer_size          = 96M
innodb_log_files_in_group       = 2
innodb_log_file_size            = 1024M
innodb_read_io_threads          = 4
innodb_thread_concurrency       = 0
innodb_locks_unsafe_for_binlog  = 1
innodb_write_io_threads         = 4


innodb_ft_min_token_size        = 0
innodb_stats_on_metadata        = 0

Additional information as requested:

number of threads = 16 per node

uses provisioned 1024 iops (aws) with 1024G storage per node

show global status: https://pastebin.com/kpiWrERR

show global variables: https://pastebin.com/x3uXrvJs

mysqltuner output: https://pastebin.com/7uGHQsGy (we've already been through a few iterations of it)

there are various other system processes the run on that box, but none of them are using more than normal and their memory usage is accounted for, im afraid thats all i can share on that.

Best Answer

I found a few serious issues:

Observations:

Version: 5.6.28-76.1-56-log
64 GB of RAM
Uptime = 1d 16:08:17
You are not running on Windows.
Running 64-bit version
You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

max_heap_table_size and tmp_table_size are 6G. For your 64G machine, don't set them above 600M (1% of RAM). What may be happening is that multiple big queries are building temp tables (more evidence of that below) and chewing up lots of RAM. It is better to have a temp table spill to disk than lead to swapping or OOM. Also, looking at the slowlog will help identify these naughty queries.

The Query Cache should probably be turned off. It is doing a lot of prunes. The QC is rarely practical in Production systems because every INSERT causes all items in the QC for the relevant table(s) to be purged. This costly. A compromise is to use it selectively (cf DEMAND) and use SQL_CACHE or SQL_NO_CACHE on SELECTs.

Assuming you have SSDs, you should increase innodb_io_capacity (to 1000) and perhaps a few similar settings.

32G buffer_pool, but Innodb_buffer_pool_reads = 344/second -- Sounds like a lot of queries that do table scans. Also 119 temp tables /second; 34 on disk. Use the slowlog to track them down.

Why so many ROLLBACKs?

Why SHOW ENGINE STATUS so often? (once a second!)

Are you monitoring deadlocks? Suggest innodb_print_all_deadlocks = ON

Some other SHOW commands are quite frequent. Keep in mind that they take a toll on performance. Perhaps run them less frequently.

I rarely find REPLACE to be practical; check your usage.

If you are using FULLTEXT indexes, I suspect that innodb_ft_min_token_size = 0 is "wrong".

If you are using PARTITIONed tables, please provide SHOW CREATE TABLE for a critique.

6-node cluster? -- How far apart are the nodes? Is the network saturated? (Each node talks to each of the other 5 on each write.)

Consider turning on wsrep_log_conflicts .

Details and other observations:

( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) = (340 + 0 + 53517294 + 5637805 + 714833 + 5637805) / 144497 = 453 /sec -- IOPs? -- If the hardware can handle it, set innodb_io_capacity (now 200) to about this value.

( ( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) / innodb_io_capacity / Uptime ) = ( 340 + 0 + 53517294 + 5637805 + 714833 + 5637805 ) / 200 / 144497 = 226.7% -- This may be a metric indicating what innodb_io_capacity is set reasonably. -- Increase innodb_io_capacity (now 200) if the hardware can handle it.

( innodb_buffer_pool_size / innodb_buffer_pool_instances ) = 32768M / 8 = 4096MB -- Size of each buffer_pool instance. -- An instance should be at least 1GB. In very large RAM, have 16 instances.

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10 -- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)

( Innodb_buffer_pool_reads ) = 49,797,223 / 144497 = 344 /sec -- Cache misses in the buffer_pool. -- Increase innodb_buffer_pool_size (now 34359738368)? (~100 is limit for HDD, ~1000 is limit for SSDs.)

( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((49797223 + 5637805) ) / 144497 = 383 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size (now 34359738368)?

( Innodb_log_writes ) = 7,403,188 / 144497 = 51 /sec

( Com_rollback ) = 2,211,653 / 144497 = 15 /sec -- ROLLBACKs in InnoDB. -- An excessive frequency of rollbacks may indicate inefficient app logic.

( Handler_rollback ) = 4,550,132 / 144497 = 31 /sec -- Why so many rollbacks?

( Innodb_row_lock_waits ) = 20,978 / 144497 = 0.15 /sec -- How often there is a delay in getting a row lock. -- May be caused by complex queries that could be optimized.

( Innodb_deadlocks ) = 1 / 144497 = 0.025 /HR -- Deadlocks -- SHOW ENGINE INNODB STATUS; to see the latest pair of queries that deadlocked.

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 53517294 + 5637805 ) / 144497 / 200 = 204.7% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity (now 200) if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( expand_fast_index_creation ) = expand_fast_index_creation = OFF -- ALTER and OPTIMIZE may be greatly sped up by using ON. -- Probably better to be ON.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( min( tmp_table_size, max_heap_table_size ) ) = (min( 6144M, 6144M )) / 65536M = 9.4% -- Percent of RAM to allocate when needing MEMORY table (per table), or temp table inside a SELECT (per temp table per some SELECTs). Too high may lead to swapping. -- Decrease tmp_table_size (now 6442450944) and max_heap_table_size (now 6442450944) to, say, 1% of ram.

( innodb_buffer_pool_populate ) = OFF = 0 -- NUMA control

( character_set_server ) = character_set_server = latin1 -- Charset problems may be helped by setting character_set_server (now latin1) to utf8mb4. That is the future default.

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( Qcache_lowmem_prunes ) = 80,501,088 / 144497 = 557 /sec -- Running out of room in QC -- increase query_cache_size (now 67108864)

( Qcache_lowmem_prunes/Qcache_inserts ) = 80,501,088/141879824 = 56.7% -- Removal Ratio (frequency of needing to prune due to not enough memory)

( Qcache_not_cached ) = 40,419,444 / 144497 = 279 /sec -- SQL_CACHE attempted, but ignored -- Rethink caching; tune qcache

( Qcache_inserts - Qcache_queries_in_cache ) = (141879824 - 16616) / 144497 = 981 /sec -- Invalidations/sec.

( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (64M - 24496808) / 16616 / 8192 = 0.313 -- query_alloc_block_size vs formula -- Adjust query_alloc_block_size (now 8192)

( Questions ) = 632,680,020 / 144497 = 4378 /sec -- Queries (outside SP) -- "qps" -- >2000 may be stressing server

( Queries ) = 634,675,672 / 144497 = 4392 /sec -- Queries (including inside SP) -- >3000 may be stressing server

( Created_tmp_tables ) = 17,211,890 / 144497 = 119 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.

( Created_tmp_disk_tables ) = 4,880,505 / 144497 = 34 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 6442450944) and max_heap_table_size (now 6442450944). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( tmp_table_size ) = 6144M -- Limit on size of MEMORY temp tables used to support a SELECT -- Decrease tmp_table_size (now 6442450944) to avoid running out of RAM. Perhaps no more than 64M.

( Com_rollback / (Com_commit + Com_rollback) ) = 2,211,653 / (5878741 + 2211653) = 27.3% -- Rollback : Commit ratio -- Rollbacks are costly; change app logic

( Select_full_join ) = 1,238,385 / 144497 = 8.6 /sec -- joins without index -- Add suitable index(es) to tables used in JOINs.

( Select_range_check ) = 369,822 / 144497 = 2.6 /sec -- no good index -- Find slow queries; check indexes.

( Select_scan ) = 10,971,747 / 144497 = 76 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Select_scan / Com_select ) = 10,971,747 / 182457623 = 6.0% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( sort_buffer_size ) = 10M -- One per thread, malloced at full size until 5.6.4, so keep low; after that bigger is ok. -- This may be eating into available RAM; recommend no more than 2M.

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (15287078 + 413037 + 137592 + 2417207 + 40196282 + 118467) / 144497 = 405 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( Com_replace ) = 2,417,207 / 144497 = 17 /sec -- Consider changing to INSERT ... ON DUPLICATE KEY UPDATE.

( binlog_error_action ) = binlog_error_action = IGNORE_ERROR -- What to do if the binlog cannot be written. -- IGNORE_ERROR is the default for backwards compatibility, but ABORT_SERVER is recommended.

( long_query_time ) = 5 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( log_slow_slave_statements ) = log_slow_slave_statements = OFF -- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Slave reads.

( Connections ) = 4,437,295 / 144497 = 31 /sec -- Connections -- Increase wait_timeout (now 28800); use pooling?

( thread_cache_size / Max_used_connections ) = 512 / 197 = 259.9% -- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.

Abnormally small:

Innodb_master_thread_idle_loops = 5
innodb_ft_min_token_size = 0

Abnormally large:

Bytes_received = 1623600 /sec
Bytes_sent = 7156640 /sec
Com_begin = 57 /sec
Com_delete_multi = 0.95 /sec
Com_kill = 0.35 /HR
Com_rename_table = 3.9 /HR
Com_show_binlogs = 12 /HR
Com_show_engine_status = 0.99 /sec
Com_show_keys = 0.073 /sec
Com_show_processlist = 0.99 /sec
Com_show_slave_hosts = 0.075 /HR
Com_show_slave_status_nolock = 24 /HR
Com_show_status = 1.2 /sec
Com_show_storage_engines = 12 /HR
Com_update = 278 /sec
Com_update_multi = 0.82 /sec
Handler_commit = 3457 /sec
Handler_external_lock = 4188 /sec
Handler_prepare = 379 /sec
Handler_read_key = 124114 /sec
Handler_read_last = 2.2 /sec
Handler_read_next = 2416004 /sec
Handler_read_prev = 37046 /sec
Handler_read_rnd = 66035 /sec
Handler_update = 1951 /sec
Innodb_buffer_pool_pages_dirty = 95,455
Innodb_buffer_pool_pages_made_not_young = 11177 /sec
Innodb_buffer_pool_pages_made_young = 323 /sec
Innodb_buffer_pool_pages_misc = 149,890
Innodb_buffer_pool_read_requests = 2506430 /sec
Innodb_data_read = 6068079 /sec
Innodb_data_reads = 370 /sec
Innodb_ibuf_free_list = 156,712
Innodb_ibuf_merged_deletes = 1.6 /sec
Innodb_ibuf_merged_inserts = 15 /sec
Innodb_ibuf_merges = 11 /sec
Innodb_ibuf_segment_size = 156,845
Innodb_ibuf_size = 132
Innodb_rows_read = 3497480 /sec
Innodb_s_lock_spin_rounds = 3604 /sec
Innodb_s_lock_spin_waits = 1130 /sec
Key_write_requests = 331 /sec
Qcache_hits = 2538 /sec
Qcache_inserts = 981 /sec
Select_full_range_join = 17 /sec
Select_full_range_join / Com_select = 1.3%
Select_range = 100 /sec
Sort_range = 151 /sec
Sort_rows = 15547 /sec
Sort_scan = 65 /sec
Table_locks_immediate = 2076 /sec
Threads_cached = 127
auto_increment_offset = 5
max_heap_table_size = 6144MB
min(max_heap_table_size, tmp_table_size) = 6144MB
wsrep_cluster_size = 6
wsrep_slave_threads = 8

Abnormal strings:

Compression = ON
ft_min_word_len = 1
ft_stopword_file = 
innodb_fast_shutdown = 1
innodb_locks_unsafe_for_binlog = ON
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
slave_rows_search_algorithms = TABLE_SCAN,INDEX_SCAN
wsrep_cluster_status = Primary
wsrep_connected = ON
wsrep_provider = /usr/lib64/galera3/libgalera_smm.so
wsrep_provider_name = Galera
wsrep_provider_vendor = Codership Oy 
wsrep_provider_version = 3.14(r53b88eb)
wsrep_sst_method = xtrabackup-v2

Slowlog

http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog

Related Solutions

Mysql – How to debug a db memory-leak causing thesql to go before it’s own limits

...even surpassing it's theorically maximum possible allocation.

[OK] Maximum possible memory usage: 7.3G (46% of installed RAM)

There is not actually a way to calculate maximum possible memory usage for MySQL, because there is no cap on the memory it can request from the system.

The calculation done by mysqltuner.pl is only an estimate, based on a formula that doesn't take into account all possible variables, because if all possible variables were taken into account, the answer would always be "infinite." It's unfortunate that it's labeled this way.

Here is my theory on what's contributing to your excessive memory usage:

thread_cache_size       = 128

Given that your max_connections is set to 200, the value of 128 for thread_cache_size seems far too high. Here's what makes me think this might be contributing to your problem:

When a thread is no longer needed, the memory allocated to it is released and returned to the system unless the thread goes back into the thread cache. In that case, the memory remains allocated.

^{http://dev.mysql.com/doc/refman/5.6/en/memory-use.html}

If your workload causes even an occasional client thread to require a large amount of memory, those threads may be holding onto that memory, then going back to the pool and sitting around, continuing to hold on to memory they don't technically "need" any more, on the premise that holding on to the memory is less costly than releasing it if you're likely to need it again.

I think it's worth a try to do the following, after first making a note of how much memory MySQL is using at the moment.

Note how many threads are currently cached:

mysql> show status like 'Threads_cached';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Threads_cached | 9     |
+----------------+-------+
1 row in set (0.00 sec)

Next, disable the thread cache.

mysql> SET GLOBAL thread_cache_size = 0;

This disables the thread cache, but the cached threads will stay in the pool until they're used one more time. Disconnect from the server, then reconnect and repeat.

mysql> show status like 'Threads_cached';

Continue disconnecting, reconnecting, and checking until the counter reaches 0.

Then, see how much memory MySQL is holding.

You may see a decrease, possibly significant, and then again you may not. I tested this on one of my systems, which had 9 threads in the cache. Once those threads had all been cleared out of the cache, the total memory held by MySQL did decrease... not by much, but it does illustrate that threads in the cache do release at least some memory when they are destroyed.

If you see a significant decrease, you may have found your problem. If you don't, then there's one more thing that needs to happen, and how quickly it can happen depends on your environment.

If the theory holds that the other threads -- the ones currently servicing active client connections -- have significant memory allocated to them, either because of recent work in their current client session or because of work requiring a lot of memory that was done by another connection prior to them languishing in the pool, then you won't see all of the potential reduction in memory consumption until those threads are allowed to die and be destroyed. Presumably your application doesn't hold them forever, but how long it will take to know for sure whether there's a difference will depend on whether you have the option of cycling your application (dropping and reconnecting the client threads) or if you'll have to just wait for them to be dropped and reconnected over time on their own.

But... it seems like a worthwhile test. You should not see a substantial performance penalty by setting thread_cache_size to 0. Fortunately, thread_cache_size is a dynamic variable, so you can freely change it with the server running.

Mysql – OOM in thesql-percona

I would suggest that even though your innodb_buffer_pool_size is undoubtedly too large for the available memory (192GB would be the theoretical max "sane" value) your thread-cache-size is likely to be what's occasionally pushing you over the edge.

There's no rational justification for such a large value, particularly with max connections limited to 1200.

I will speculate that shrinking the buffer pool size may lengthen the interval between oom events, and you definitely need to do that... but reducing the thread cache size value may be necessary to eliminate oom events altogether.

Individual client threads can have essentially unbounded memory growth, depending on what's needed to execute any particular query. (That's why the memory usage calculation formulas you'll see that multiply by max connections are essentially worthless for calculating meaningful numbers -- they only give you the least-worst-case scenario).

With the thread cache at 600, you are leaving a door open for a hundreds of memory-intensive queries to leave behind a cached thread holding on to memory that may not be cleaned up for a long time, if ever. It's not technically a "leak," but it behaves similarly.

Decrease this variable below the point where the SHOW STATUS LIKE 'threads_created'; never reaches the point where it stops growing somewhat consistently during moderate to high traffic periods... unless you have an incredibly bursty demand for the number of concurrent connections, you only need a very small number of threads in the thread cache.

The only circumstance I recall where I needed this value to be larger was an application that, on start/restart, attempted to make 48 new client connections in parallel at easentially exactly the same time. (Yes, they had their reasons). Without some number of cached threads available, the server couldn't reliably establish this many new connections so quickly.

And that's what the thread cache does -- reduces the resource cost of new connections to the server, at the expense of maintaining the old ones.

Note, the query cache is also probably too large for good performance. It's essentially unrelated to the oom issue since it doesn't grow, but it's still excessive, since the qcache is a choke point for every query, and a larger cache takes more processing time to manage.

Best Answer

Related Solutions

Mysql – How to debug a db memory-leak causing thesql to go before it’s own limits

Mysql – OOM in thesql-percona

Related Question