Mysql – Percona XtraDB Cluster (5.6.28-76.1-56-log) – random OOM events

memoryMySQLperconaperformance

We have a multi-node (each of 64G RAM) cluster. The main (the one we use for mutating data) node randomly just OOMs, this happens both in peak times and in extremely quiet times.

What we've noticed just before the OOM event is usually a bunch of transactions waiting for locks on the main node. MySQL tuner suggests that the max possible usage of mysql is 59G, but when this starts to buildup it exhausts ALL the resources on the box and OOM event happens. There is nothing else running on that box.

The last occurrence of the event was on a quiet period, where there were very small amount of connections (less than 150), with even less actively doing anything.

Can I get any pointers on what may be causing MySQL to go over what is supposed be its maximum in terms of memory?

our my.cnf (sanitised a bit)

[mysqld]
datadir                         = /var/lib/mysql
long_query_time                 = 5
slow_query_log                  = 1
slow_query_log_file             = /var/log/mysql/slow_query.log
log_bin                         = mysql-bin
log_error                       = /var/log/mysql/error.log
tmpdir                          = /var/lib/mysql/.tmpdir
binlog_format                   = ROW
default_storage_engine          = InnoDB




expire_logs_days                = 28
ft_min_word_len                 = 1
ft_stopword_file                = ''
join_buffer_size                = 16M
key_buffer_size                 = 24M
lc-messages-dir                 = /usr/share/mysql
lower_case_table_names          = 0
max_allowed_packet              = 512M
max_connections                 = 400
max_heap_table_size             = 6G
memlock                         = 0
myisam-recover                  = BACKUP
query_cache_limit               = 4M
query_cache_size                = 64M
query_cache_type                = 1
read_buffer_size                = 2M
read_rnd_buffer_size            = 16M
skip-external-locking
skip_name_resolve
sort_buffer_size                = 10M
sysdate_is_now                  = 1
table_open_cache                = 1024
thread_cache_size               = 512
thread_stack                    = 192K
tmp_table_size                  = 6G

explicit_defaults_for_timestamp = 1


innodb_autoinc_lock_mode        = 2
innodb_buffer_pool_instances    = 8
innodb_buffer_pool_size         = 32G
innodb_doublewrite              = 1
innodb_file_per_table           = 1
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_io_capacity              = 200
innodb_log_buffer_size          = 96M
innodb_log_files_in_group       = 2
innodb_log_file_size            = 1024M
innodb_read_io_threads          = 4
innodb_thread_concurrency       = 0
innodb_locks_unsafe_for_binlog  = 1
innodb_write_io_threads         = 4


innodb_ft_min_token_size        = 0
innodb_stats_on_metadata        = 0

Additional information as requested:

number of threads = 16 per node

uses provisioned 1024 iops (aws) with 1024G storage per node

show global status: https://pastebin.com/kpiWrERR

show global variables: https://pastebin.com/x3uXrvJs

mysqltuner output: https://pastebin.com/7uGHQsGy (we've already been through a few iterations of it)

there are various other system processes the run on that box, but none of them are using more than normal and their memory usage is accounted for, im afraid thats all i can share on that.

Best Answer

I found a few serious issues:

Observations:

  • Version: 5.6.28-76.1-56-log
  • 64 GB of RAM
  • Uptime = 1d 16:08:17
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

max_heap_table_size and tmp_table_size are 6G. For your 64G machine, don't set them above 600M (1% of RAM). What may be happening is that multiple big queries are building temp tables (more evidence of that below) and chewing up lots of RAM. It is better to have a temp table spill to disk than lead to swapping or OOM. Also, looking at the slowlog will help identify these naughty queries.

The Query Cache should probably be turned off. It is doing a lot of prunes. The QC is rarely practical in Production systems because every INSERT causes all items in the QC for the relevant table(s) to be purged. This costly. A compromise is to use it selectively (cf DEMAND) and use SQL_CACHE or SQL_NO_CACHE on SELECTs.

Assuming you have SSDs, you should increase innodb_io_capacity (to 1000) and perhaps a few similar settings.

32G buffer_pool, but Innodb_buffer_pool_reads = 344/second -- Sounds like a lot of queries that do table scans. Also 119 temp tables /second; 34 on disk. Use the slowlog to track them down.

Why so many ROLLBACKs?

Why SHOW ENGINE STATUS so often? (once a second!)

Are you monitoring deadlocks? Suggest innodb_print_all_deadlocks = ON

Some other SHOW commands are quite frequent. Keep in mind that they take a toll on performance. Perhaps run them less frequently.

I rarely find REPLACE to be practical; check your usage.

If you are using FULLTEXT indexes, I suspect that innodb_ft_min_token_size = 0 is "wrong".

If you are using PARTITIONed tables, please provide SHOW CREATE TABLE for a critique.

6-node cluster? -- How far apart are the nodes? Is the network saturated? (Each node talks to each of the other 5 on each write.)

Consider turning on wsrep_log_conflicts .

Details and other observations:

( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) = (340 + 0 + 53517294 + 5637805 + 714833 + 5637805) / 144497 = 453 /sec -- IOPs? -- If the hardware can handle it, set innodb_io_capacity (now 200) to about this value.

( ( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) / innodb_io_capacity / Uptime ) = ( 340 + 0 + 53517294 + 5637805 + 714833 + 5637805 ) / 200 / 144497 = 226.7% -- This may be a metric indicating what innodb_io_capacity is set reasonably. -- Increase innodb_io_capacity (now 200) if the hardware can handle it.

( innodb_buffer_pool_size / innodb_buffer_pool_instances ) = 32768M / 8 = 4096MB -- Size of each buffer_pool instance. -- An instance should be at least 1GB. In very large RAM, have 16 instances.

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10 -- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)

( Innodb_buffer_pool_reads ) = 49,797,223 / 144497 = 344 /sec -- Cache misses in the buffer_pool. -- Increase innodb_buffer_pool_size (now 34359738368)? (~100 is limit for HDD, ~1000 is limit for SSDs.)

( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((49797223 + 5637805) ) / 144497 = 383 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size (now 34359738368)?

( Innodb_log_writes ) = 7,403,188 / 144497 = 51 /sec

( Com_rollback ) = 2,211,653 / 144497 = 15 /sec -- ROLLBACKs in InnoDB. -- An excessive frequency of rollbacks may indicate inefficient app logic.

( Handler_rollback ) = 4,550,132 / 144497 = 31 /sec -- Why so many rollbacks?

( Innodb_row_lock_waits ) = 20,978 / 144497 = 0.15 /sec -- How often there is a delay in getting a row lock. -- May be caused by complex queries that could be optimized.

( Innodb_deadlocks ) = 1 / 144497 = 0.025 /HR -- Deadlocks -- SHOW ENGINE INNODB STATUS; to see the latest pair of queries that deadlocked.

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 53517294 + 5637805 ) / 144497 / 200 = 204.7% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity (now 200) if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( expand_fast_index_creation ) = expand_fast_index_creation = OFF -- ALTER and OPTIMIZE may be greatly sped up by using ON. -- Probably better to be ON.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( min( tmp_table_size, max_heap_table_size ) ) = (min( 6144M, 6144M )) / 65536M = 9.4% -- Percent of RAM to allocate when needing MEMORY table (per table), or temp table inside a SELECT (per temp table per some SELECTs). Too high may lead to swapping. -- Decrease tmp_table_size (now 6442450944) and max_heap_table_size (now 6442450944) to, say, 1% of ram.

( innodb_buffer_pool_populate ) = OFF = 0 -- NUMA control

( character_set_server ) = character_set_server = latin1 -- Charset problems may be helped by setting character_set_server (now latin1) to utf8mb4. That is the future default.

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( Qcache_lowmem_prunes ) = 80,501,088 / 144497 = 557 /sec -- Running out of room in QC -- increase query_cache_size (now 67108864)

( Qcache_lowmem_prunes/Qcache_inserts ) = 80,501,088/141879824 = 56.7% -- Removal Ratio (frequency of needing to prune due to not enough memory)

( Qcache_not_cached ) = 40,419,444 / 144497 = 279 /sec -- SQL_CACHE attempted, but ignored -- Rethink caching; tune qcache

( Qcache_inserts - Qcache_queries_in_cache ) = (141879824 - 16616) / 144497 = 981 /sec -- Invalidations/sec.

( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (64M - 24496808) / 16616 / 8192 = 0.313 -- query_alloc_block_size vs formula -- Adjust query_alloc_block_size (now 8192)

( Questions ) = 632,680,020 / 144497 = 4378 /sec -- Queries (outside SP) -- "qps" -- >2000 may be stressing server

( Queries ) = 634,675,672 / 144497 = 4392 /sec -- Queries (including inside SP) -- >3000 may be stressing server

( Created_tmp_tables ) = 17,211,890 / 144497 = 119 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.

( Created_tmp_disk_tables ) = 4,880,505 / 144497 = 34 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 6442450944) and max_heap_table_size (now 6442450944). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( tmp_table_size ) = 6144M -- Limit on size of MEMORY temp tables used to support a SELECT -- Decrease tmp_table_size (now 6442450944) to avoid running out of RAM. Perhaps no more than 64M.

( Com_rollback / (Com_commit + Com_rollback) ) = 2,211,653 / (5878741 + 2211653) = 27.3% -- Rollback : Commit ratio -- Rollbacks are costly; change app logic

( Select_full_join ) = 1,238,385 / 144497 = 8.6 /sec -- joins without index -- Add suitable index(es) to tables used in JOINs.

( Select_range_check ) = 369,822 / 144497 = 2.6 /sec -- no good index -- Find slow queries; check indexes.

( Select_scan ) = 10,971,747 / 144497 = 76 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Select_scan / Com_select ) = 10,971,747 / 182457623 = 6.0% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( sort_buffer_size ) = 10M -- One per thread, malloced at full size until 5.6.4, so keep low; after that bigger is ok. -- This may be eating into available RAM; recommend no more than 2M.

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (15287078 + 413037 + 137592 + 2417207 + 40196282 + 118467) / 144497 = 405 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( Com_replace ) = 2,417,207 / 144497 = 17 /sec -- Consider changing to INSERT ... ON DUPLICATE KEY UPDATE.

( binlog_error_action ) = binlog_error_action = IGNORE_ERROR -- What to do if the binlog cannot be written. -- IGNORE_ERROR is the default for backwards compatibility, but ABORT_SERVER is recommended.

( long_query_time ) = 5 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( log_slow_slave_statements ) = log_slow_slave_statements = OFF -- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Slave reads.

( Connections ) = 4,437,295 / 144497 = 31 /sec -- Connections -- Increase wait_timeout (now 28800); use pooling?

( thread_cache_size / Max_used_connections ) = 512 / 197 = 259.9% -- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.

Abnormally small:

Innodb_master_thread_idle_loops = 5
innodb_ft_min_token_size = 0

Abnormally large:

Bytes_received = 1623600 /sec
Bytes_sent = 7156640 /sec
Com_begin = 57 /sec
Com_delete_multi = 0.95 /sec
Com_kill = 0.35 /HR
Com_rename_table = 3.9 /HR
Com_show_binlogs = 12 /HR
Com_show_engine_status = 0.99 /sec
Com_show_keys = 0.073 /sec
Com_show_processlist = 0.99 /sec
Com_show_slave_hosts = 0.075 /HR
Com_show_slave_status_nolock = 24 /HR
Com_show_status = 1.2 /sec
Com_show_storage_engines = 12 /HR
Com_update = 278 /sec
Com_update_multi = 0.82 /sec
Handler_commit = 3457 /sec
Handler_external_lock = 4188 /sec
Handler_prepare = 379 /sec
Handler_read_key = 124114 /sec
Handler_read_last = 2.2 /sec
Handler_read_next = 2416004 /sec
Handler_read_prev = 37046 /sec
Handler_read_rnd = 66035 /sec
Handler_update = 1951 /sec
Innodb_buffer_pool_pages_dirty = 95,455
Innodb_buffer_pool_pages_made_not_young = 11177 /sec
Innodb_buffer_pool_pages_made_young = 323 /sec
Innodb_buffer_pool_pages_misc = 149,890
Innodb_buffer_pool_read_requests = 2506430 /sec
Innodb_data_read = 6068079 /sec
Innodb_data_reads = 370 /sec
Innodb_ibuf_free_list = 156,712
Innodb_ibuf_merged_deletes = 1.6 /sec
Innodb_ibuf_merged_inserts = 15 /sec
Innodb_ibuf_merges = 11 /sec
Innodb_ibuf_segment_size = 156,845
Innodb_ibuf_size = 132
Innodb_rows_read = 3497480 /sec
Innodb_s_lock_spin_rounds = 3604 /sec
Innodb_s_lock_spin_waits = 1130 /sec
Key_write_requests = 331 /sec
Qcache_hits = 2538 /sec
Qcache_inserts = 981 /sec
Select_full_range_join = 17 /sec
Select_full_range_join / Com_select = 1.3%
Select_range = 100 /sec
Sort_range = 151 /sec
Sort_rows = 15547 /sec
Sort_scan = 65 /sec
Table_locks_immediate = 2076 /sec
Threads_cached = 127
auto_increment_offset = 5
max_heap_table_size = 6144MB
min(max_heap_table_size, tmp_table_size) = 6144MB
wsrep_cluster_size = 6
wsrep_slave_threads = 8

Abnormal strings:

Compression = ON
ft_min_word_len = 1
ft_stopword_file = 
innodb_fast_shutdown = 1
innodb_locks_unsafe_for_binlog = ON
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
slave_rows_search_algorithms = TABLE_SCAN,INDEX_SCAN
wsrep_cluster_status = Primary
wsrep_connected = ON
wsrep_provider = /usr/lib64/galera3/libgalera_smm.so
wsrep_provider_name = Galera
wsrep_provider_vendor = Codership Oy 
wsrep_provider_version = 3.14(r53b88eb)
wsrep_sst_method = xtrabackup-v2

Slowlog

http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog