We have a multi-node (each of 64G RAM) cluster. The main (the one we use for mutating data) node randomly just OOMs, this happens both in peak times and in extremely quiet times.
What we've noticed just before the OOM event is usually a bunch of transactions waiting for locks on the main node. MySQL tuner suggests that the max possible usage of mysql is 59G, but when this starts to buildup it exhausts ALL the resources on the box and OOM event happens. There is nothing else running on that box.
The last occurrence of the event was on a quiet period, where there were very small amount of connections (less than 150), with even less actively doing anything.
Can I get any pointers on what may be causing MySQL to go over what is supposed be its maximum in terms of memory?
our my.cnf (sanitised a bit)
[mysqld]
datadir = /var/lib/mysql
long_query_time = 5
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow_query.log
log_bin = mysql-bin
log_error = /var/log/mysql/error.log
tmpdir = /var/lib/mysql/.tmpdir
binlog_format = ROW
default_storage_engine = InnoDB
expire_logs_days = 28
ft_min_word_len = 1
ft_stopword_file = ''
join_buffer_size = 16M
key_buffer_size = 24M
lc-messages-dir = /usr/share/mysql
lower_case_table_names = 0
max_allowed_packet = 512M
max_connections = 400
max_heap_table_size = 6G
memlock = 0
myisam-recover = BACKUP
query_cache_limit = 4M
query_cache_size = 64M
query_cache_type = 1
read_buffer_size = 2M
read_rnd_buffer_size = 16M
skip-external-locking
skip_name_resolve
sort_buffer_size = 10M
sysdate_is_now = 1
table_open_cache = 1024
thread_cache_size = 512
thread_stack = 192K
tmp_table_size = 6G
explicit_defaults_for_timestamp = 1
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 8
innodb_buffer_pool_size = 32G
innodb_doublewrite = 1
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 200
innodb_log_buffer_size = 96M
innodb_log_files_in_group = 2
innodb_log_file_size = 1024M
innodb_read_io_threads = 4
innodb_thread_concurrency = 0
innodb_locks_unsafe_for_binlog = 1
innodb_write_io_threads = 4
innodb_ft_min_token_size = 0
innodb_stats_on_metadata = 0
Additional information as requested:
number of threads = 16 per node
uses provisioned 1024 iops (aws) with 1024G storage per node
show global status: https://pastebin.com/kpiWrERR
show global variables: https://pastebin.com/x3uXrvJs
mysqltuner output: https://pastebin.com/7uGHQsGy (we've already been through a few iterations of it)
there are various other system processes the run on that box, but none of them are using more than normal and their memory usage is accounted for, im afraid thats all i can share on that.
Best Answer
I found a few serious issues:
Observations:
The More Important Issues:
max_heap_table_size and tmp_table_size are 6G. For your 64G machine, don't set them above 600M (1% of RAM). What may be happening is that multiple big queries are building temp tables (more evidence of that below) and chewing up lots of RAM. It is better to have a temp table spill to disk than lead to swapping or OOM. Also, looking at the slowlog will help identify these naughty queries.
The Query Cache should probably be turned off. It is doing a lot of prunes. The QC is rarely practical in Production systems because every INSERT causes all items in the QC for the relevant table(s) to be purged. This costly. A compromise is to use it selectively (cf DEMAND) and use SQL_CACHE or SQL_NO_CACHE on SELECTs.
Assuming you have SSDs, you should increase innodb_io_capacity (to 1000) and perhaps a few similar settings.
32G buffer_pool, but Innodb_buffer_pool_reads = 344/second -- Sounds like a lot of queries that do table scans. Also 119 temp tables /second; 34 on disk. Use the slowlog to track them down.
Why so many ROLLBACKs?
Why SHOW ENGINE STATUS so often? (once a second!)
Are you monitoring deadlocks? Suggest
innodb_print_all_deadlocks = ON
Some other SHOW commands are quite frequent. Keep in mind that they take a toll on performance. Perhaps run them less frequently.
I rarely find REPLACE to be practical; check your usage.
If you are using FULLTEXT indexes, I suspect that
innodb_ft_min_token_size = 0
is "wrong".If you are using PARTITIONed tables, please provide SHOW CREATE TABLE for a critique.
6-node cluster? -- How far apart are the nodes? Is the network saturated? (Each node talks to each of the other 5 on each write.)
Consider turning on wsrep_log_conflicts .
Details and other observations:
( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) = (340 + 0 + 53517294 + 5637805 + 714833 + 5637805) / 144497 = 453 /sec
-- IOPs? -- If the hardware can handle it, set innodb_io_capacity (now 200) to about this value.( ( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) / innodb_io_capacity / Uptime ) = ( 340 + 0 + 53517294 + 5637805 + 714833 + 5637805 ) / 200 / 144497 = 226.7%
-- This may be a metric indicating what innodb_io_capacity is set reasonably. -- Increase innodb_io_capacity (now 200) if the hardware can handle it.( innodb_buffer_pool_size / innodb_buffer_pool_instances ) = 32768M / 8 = 4096MB
-- Size of each buffer_pool instance. -- An instance should be at least 1GB. In very large RAM, have 16 instances.( innodb_lru_scan_depth ) = 1,024
-- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10
-- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)( Innodb_buffer_pool_reads ) = 49,797,223 / 144497 = 344 /sec
-- Cache misses in the buffer_pool. -- Increase innodb_buffer_pool_size (now 34359738368)? (~100 is limit for HDD, ~1000 is limit for SSDs.)( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((49797223 + 5637805) ) / 144497 = 383 /sec
-- InnoDB I/O -- Increase innodb_buffer_pool_size (now 34359738368)?( Innodb_log_writes ) = 7,403,188 / 144497 = 51 /sec
( Com_rollback ) = 2,211,653 / 144497 = 15 /sec
-- ROLLBACKs in InnoDB. -- An excessive frequency of rollbacks may indicate inefficient app logic.( Handler_rollback ) = 4,550,132 / 144497 = 31 /sec
-- Why so many rollbacks?( Innodb_row_lock_waits ) = 20,978 / 144497 = 0.15 /sec
-- How often there is a delay in getting a row lock. -- May be caused by complex queries that could be optimized.( Innodb_deadlocks ) = 1 / 144497 = 0.025 /HR
-- Deadlocks -- SHOW ENGINE INNODB STATUS; to see the latest pair of queries that deadlocked.( innodb_flush_neighbors ) = 1
-- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 53517294 + 5637805 ) / 144497 / 200 = 204.7%
-- If > 100%, need more io_capacity. -- Increase innodb_io_capacity (now 200) if the drives can handle it.( innodb_io_capacity ) = 200
-- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.( expand_fast_index_creation ) = expand_fast_index_creation = OFF
-- ALTER and OPTIMIZE may be greatly sped up by using ON. -- Probably better to be ON.( sync_binlog ) = 0
-- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster.( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF
-- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.( min( tmp_table_size, max_heap_table_size ) ) = (min( 6144M, 6144M )) / 65536M = 9.4%
-- Percent of RAM to allocate when needing MEMORY table (per table), or temp table inside a SELECT (per temp table per some SELECTs). Too high may lead to swapping. -- Decrease tmp_table_size (now 6442450944) and max_heap_table_size (now 6442450944) to, say, 1% of ram.( innodb_buffer_pool_populate ) = OFF = 0
-- NUMA control( character_set_server ) = character_set_server = latin1
-- Charset problems may be helped by setting character_set_server (now latin1) to utf8mb4. That is the future default.( local_infile ) = local_infile = ON
-- local_infile (now ON) = ON is a potential security issue( Qcache_lowmem_prunes ) = 80,501,088 / 144497 = 557 /sec
-- Running out of room in QC -- increase query_cache_size (now 67108864)( Qcache_lowmem_prunes/Qcache_inserts ) = 80,501,088/141879824 = 56.7%
-- Removal Ratio (frequency of needing to prune due to not enough memory)( Qcache_not_cached ) = 40,419,444 / 144497 = 279 /sec
-- SQL_CACHE attempted, but ignored -- Rethink caching; tune qcache( Qcache_inserts - Qcache_queries_in_cache ) = (141879824 - 16616) / 144497 = 981 /sec
-- Invalidations/sec.( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (64M - 24496808) / 16616 / 8192 = 0.313
-- query_alloc_block_size vs formula -- Adjust query_alloc_block_size (now 8192)( Questions ) = 632,680,020 / 144497 = 4378 /sec
-- Queries (outside SP) -- "qps" -- >2000 may be stressing server( Queries ) = 634,675,672 / 144497 = 4392 /sec
-- Queries (including inside SP) -- >3000 may be stressing server( Created_tmp_tables ) = 17,211,890 / 144497 = 119 /sec
-- Frequency of creating "temp" tables as part of complex SELECTs.( Created_tmp_disk_tables ) = 4,880,505 / 144497 = 34 /sec
-- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 6442450944) and max_heap_table_size (now 6442450944). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.( tmp_table_size ) = 6144M
-- Limit on size of MEMORY temp tables used to support a SELECT -- Decrease tmp_table_size (now 6442450944) to avoid running out of RAM. Perhaps no more than 64M.( Com_rollback / (Com_commit + Com_rollback) ) = 2,211,653 / (5878741 + 2211653) = 27.3%
-- Rollback : Commit ratio -- Rollbacks are costly; change app logic( Select_full_join ) = 1,238,385 / 144497 = 8.6 /sec
-- joins without index -- Add suitable index(es) to tables used in JOINs.( Select_range_check ) = 369,822 / 144497 = 2.6 /sec
-- no good index -- Find slow queries; check indexes.( Select_scan ) = 10,971,747 / 144497 = 76 /sec
-- full table scans -- Add indexes / optimize queries (unless they are tiny tables)( Select_scan / Com_select ) = 10,971,747 / 182457623 = 6.0%
-- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries( sort_buffer_size ) = 10M
-- One per thread, malloced at full size until 5.6.4, so keep low; after that bigger is ok. -- This may be eating into available RAM; recommend no more than 2M.( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (15287078 + 413037 + 137592 + 2417207 + 40196282 + 118467) / 144497 = 405 /sec
-- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives( Com_replace ) = 2,417,207 / 144497 = 17 /sec
-- Consider changing to INSERT ... ON DUPLICATE KEY UPDATE.( binlog_error_action ) = binlog_error_action = IGNORE_ERROR
-- What to do if the binlog cannot be written. -- IGNORE_ERROR is the default for backwards compatibility, but ABORT_SERVER is recommended.( long_query_time ) = 5
-- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2( log_slow_slave_statements ) = log_slow_slave_statements = OFF
-- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Slave reads.( Connections ) = 4,437,295 / 144497 = 31 /sec
-- Connections -- Increase wait_timeout (now 28800); use pooling?( thread_cache_size / Max_used_connections ) = 512 / 197 = 259.9%
-- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.Abnormally small:
Abnormally large:
Abnormal strings:
Slowlog
http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog