MySQL will not release global buffers after they are used (innodb_buffer_pool
, key_buffer_size
, query_cache_size
, etc). It will release memory taken by per-thread buffers (join_buffer_size
, sort_buffer_size
, etc) but only after the thread is removed and not in the thread_cache
.
The innodb_buffer_pool_size
can be tuned dynamically in 5.7, but it is a very invasive operation:
When initiating a resizing operation, the operation does not start until all active transactions are completed. Once the resizing operation is in progress, new transactions and operations that require access to the buffer pool must wait until the resizing operation finishes. The exception to this rule is that concurrent access to the buffer pool is permitted while the buffer pool is defragmented and pages are withdrawn during an operation to decrease buffer pool size. A drawback of allowing concurrent access is that it could result in a temporary shortage of available pages while pages are being withdrawn.
Is the additional buffer pool worth it? This query will give you an idea of the current maximum memory that will be used by the InnoDB buffer pool:
SELECT ENGINE, SUM(DATA_LENGTH)/1024/1024/1024 as DATA_LENGTH_IN_GB,
SUM(INDEX_LENGTH)/1024/1024/1024 AS INDEX_LENGTH_IN_GB,
SUM(DATA_LENGTH+INDEX_LENGTH)/1024/1024/1024 as TOTAL_IN_GB
FROM information_schema.TABLES
WHERE ENGINE IN ("InnoDB")
AND TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema')
GROUP BY ENGINE
ORDER BY DATA_LENGTH DESC;
However, that is the total InnoDB size of data+indexes. The working set is what is going to matter during peak times, so this will include your most active tables/rows. If your working set already fits in memory, allocating more is not going to buy much for you. If it doesn't fit in memory, you're going to want to increase the buffer pool regardless of peak/non-peak traffic.
I would recommend reading the internal details section of the MySQL documentation of the InnoDB Buffer Pool as a great primer.
As for the rest of your configuration, the only thing I will say is that your temp_table_size=333M
is not doing what you think it is, and is likely too high.
First:
The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size values.
This means, with max_heap_table_size
defaulting to 16MB, the internal memory tables are going to be converted to on disk temporary files long before reaching 330MB.
Second, you might have increased temp_table_size
to avoid temporary on disk table usage. But, there are scenarios where MySQL can never use in-memory temporary tables:
Some conditions prevent the use of an in-memory temporary table, in which case the server uses an on-disk table instead:
Presence of a BLOB or TEXT column in the table
Presence of any string column in a GROUP BY or DISTINCT clause larger than 512 bytes for binary strings or 512 characters for nonbinary strings. (Before MySQL 5.7.3, the limit is 512 bytes regardless of string type.)
Presence of any string column with a maximum length larger than 512 (bytes for binary strings, characters for nonbinary strings) in the SELECT list, if UNION or UNION ALL is used
The SHOW COLUMNS and the DESCRIBE statements use BLOB as the type for some columns, thus the temporary table used for the results is an on-disk table.
I would recommend setting temp_table_size
and max_heap_table_size
to 32M or 64M (max) and then examining queries for some of the conditions mentioned above. If you can't avoid on-disk temporary tables, hopefully your cloud provider puts the tmpdir
on SSD.
600q/s write with a flush per commit is probably hitting the limit of your current spinning disks. Switching to SSDs would relieve the pressure.
The quick fix (before getting SSDs) is probably to change to this setting:
innodb_flush_log_at_trx_commit = 2
But read the caveats on making that change.
Having that setting and SSDs would let you grow further.
Another possible fix is to combine some writes into a single COMMIT
(where the logic is not violated).
Almost always, high CPU and/or I/O is due to poor indexes and/or poor formulation of queries. Turn on the slowlog with long_query_time=1
, wait a while, then see what turns up. With queries in hand, provide SELECT
, EXPLAIN SELECT ...
, and SHOW CREATE TABLE
. Ditto for the write queries. From those, we can probably tame the CPU and/or I/O. Even with your current setting of 3
, pt-query-digest
might find some interesting things.
Note that with 50 "running" threads, there is a lot of contention; this may be causing the switching, etc, that you noted. We need to get queries to finish faster. With 5.7, the system may keel over with 100 running threads. Increasing beyond about 64, the context switches, mutexes, locks, etc, conspire to slow down every thread, leading to no improvement in throughput while latency goes through the roof.
For a different approach to analyzing the problem, please provide SHOW VARIABLES
and SHOW GLOBAL STATUS
? More discussion here.
Analysis of VARIABLES & STATUS
(Sorry, nothing jumps out as addressing your Question.)
Observations:
- Version: 5.7.20-log
- 700 GB of RAM
- Uptime = 36d 13:21:34
- You are not running on Windows.
- Running 64-bit version
- You appear to be running entirely (or mostly) InnoDB.
The More Important Issues:
A lot of temp tables, especially disk-based, are created for complex queries. Let's hope that the slow log will identifiy some queries that can be improved (via indexing / reformulating / etc.)
Other indicators are joins without indexes and sort_merge_passes; however, neither of these is conclusive, we need to see the queries.
Max_used_connections = 701
is >= Max_connections = 700
, so there were probably some connections refused. Also, if that indicated more than, say, 64 threads running, then system performance probably sufferd at that time.
Consider throttling the number of connections by throttling the clients. Are you using Apache, Tomcat, or something else?
70 Threads_running
indicates that, at the moment of doing this SHOW
, the system was in trouble.
Increasing the number of statements in each COMMIT
(when reasonable) may help performance some.
innodb_log_file_size
, at 15GB, is bigger than necessary, but I see not need to change it.
Thousands of tables is usually not a good design.
eq_range_index_dive_limit = 200
concerns me, but I don't know how to advise. Was it a deliberate choice?
Why so many CREATE+DROP PROCEDURE ?
Why so many SHOW commands?
Details and other observations:
( Innodb_buffer_pool_pages_flushed ) = 523,716,598 / 3158494 = 165 /sec
-- Writes (flushes)
-- check innodb_buffer_pool_size
( table_open_cache ) = 10,000
-- Number of table descriptors to cache
-- Several hundred is usually good.
( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((61,040,718 + 523,716,598) ) / 3158494 = 185 /sec
-- InnoDB I/O
( Innodb_dblwr_pages_written/Innodb_pages_written ) = 459,782,684/523,717,889 = 87.8%
-- Seems like these values should be equal?
( Innodb_os_log_written ) = 1,071,443,919,360 / 3158494 = 339226 /sec
-- This is an indicator of how busy InnoDB is.
-- Very busy InnoDB.
( Innodb_log_writes ) = 110,905,716 / 3158494 = 35 /sec
( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 1,071,443,919,360 / (3158494 / 3600) / 2 / 15360M = 0.0379
-- Ratio
-- (see minutes)
( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 3,158,494 / 60 * 15360M / 1071443919360 = 791
-- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf.
-- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)
( Com_rollback ) = 770,457 / 3158494 = 0.24 /sec
-- ROLLBACKs in InnoDB.
-- An excessive frequency of rollbacks may indicate inefficient app logic.
( Innodb_row_lock_waits ) = 632,292 / 3158494 = 0.2 /sec
-- How often there is a delay in getting a row lock.
-- May be caused by complex queries that could be optimized.
( Innodb_dblwr_writes ) = 97,725,876 / 3158494 = 31 /sec
-- "Doublewrite buffer" writes to disk. "Doublewrites" are a reliability feature. Some newer versions / configurations don't need them.
-- (Symptom of other issues)
( Innodb_row_lock_current_waits ) = 13
-- The number of row locks currently being waited for by operations on InnoDB tables. Zero is pretty normal.
-- Something big is going on?
( innodb_print_all_deadlocks ) = OFF
-- Whether to log all Deadlocks.
-- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.
( local_infile ) = ON
-- local_infile = ON is a potential security issue
( bulk_insert_buffer_size / _ram ) = 8M / 716800M = 0.00%
-- Buffer for multi-row INSERTs and LOAD DATA
-- Too big could threaten RAM size. Too small could hinder such operations.
( Questions ) = 9,658,430,713 / 3158494 = 3057 /sec
-- Queries (outside SP) -- "qps"
-- >2000 may be stressing server
( Queries ) = 9,678,805,194 / 3158494 = 3064 /sec
-- Queries (including inside SP)
-- >3000 may be stressing server
( Created_tmp_tables ) = 1,107,271,497 / 3158494 = 350 /sec
-- Frequency of creating "temp" tables as part of complex SELECTs.
( Created_tmp_disk_tables ) = 297,023,373 / 3158494 = 94 /sec
-- Frequency of creating disk "temp" tables as part of complex SELECTs
-- increase tmp_table_size and max_heap_table_size.
Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM.
Better indexes and reformulation of queries are more likely to help.
( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (693300264 + 214511608 + 37537668 + 0) / 1672382928 = 0.565
-- Statements per Commit (assuming all InnoDB)
-- Low: Might help to group queries together in transactions; High: long transactions strain various things.
( Select_full_join ) = 338,957,314 / 3158494 = 107 /sec
-- joins without index
-- Add suitable index(es) to tables used in JOINs.
( Select_full_join / Com_select ) = 338,957,314 / 6763083714 = 5.0%
-- % of selects that are indexless join
-- Add suitable index(es) to tables used in JOINs.
( Select_scan ) = 124,606,973 / 3158494 = 39 /sec
-- full table scans
-- Add indexes / optimize queries (unless they are tiny tables)
( Sort_merge_passes ) = 1,136,548 / 3158494 = 0.36 /sec
-- Heafty sorts
-- Increase sort_buffer_size and/or optimize complex queries.
( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (693300264 + 37537668 + 198418338 + 0 + 214511608 + 79274476) / 3158494 = 387 /sec
-- writes/sec
-- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives
( ( Com_stmt_prepare - Com_stmt_close ) / ( Com_stmt_prepare + Com_stmt_close ) ) = ( 39 - 38 ) / ( 39 + 38 ) = 1.3%
-- Are you closing your prepared statements?
-- Add Closes.
( Com_stmt_close / Com_stmt_prepare ) = 38 / 39 = 97.4%
-- Prepared statements should be Closed.
-- Check whether all Prepared statements are "Closed".
( innodb_autoinc_lock_mode ) = 1
-- Galera: desires 2 -- 2 = "interleaved"; 1 = "consecutive" is typical; 0 = "traditional".
( Max_used_connections / max_connections ) = 701 / 700 = 100.1%
-- Peak % of connections
-- increase max_connections and/or decrease wait_timeout
( Threads_running - 1 ) = 71 - 1 = 70
-- Active threads (concurrency when data collected)
-- Optimize queries and/or schema
Abnormally large: (Most of these stem from being a very busy system.)
Com_commit = 529 /sec
Com_create_procedure = 0.01 /HR
Com_drop_procedure = 0.01 /HR
Com_delete = 12 /sec
Com_delete_multi = 63 /sec
Com_insert = 219 /sec
Com_kill = 0.69 /HR
Com_reset = 0.0011 /HR
Com_revoke = 0.0023 /HR
Com_select = 2141 /sec
Com_show_binlogs = 12 /HR
Com_show_create_func = 0.011 /HR
Com_show_privileges = 0.0034 /HR
Com_show_profile = 0.027 /HR
Com_show_profiles = 0.028 /HR
Com_show_slave_status = 0.037 /sec
Com_show_storage_engines = 12 /HR
Com_show_warnings = 0.14 /sec
Com_slave_stop = 0.0011 /HR
Com_update_multi = 25 /sec
Created_tmp_files = 0.3 /sec
Handler_commit = 3251 /sec
Handler_external_lock = 18787 /sec
Handler_prepare = 615 /sec
Handler_read_first = 239 /sec
Handler_read_key = 173669 /sec
Handler_read_next = 1291439 /sec
Handler_read_prev = 28535 /sec
Handler_read_rnd = 32789 /sec
(continued)
Innodb_buffer_pool_bytes_dirty = 7.03e+10
Innodb_buffer_pool_pages_data = 3.41e+7
Innodb_buffer_pool_pages_dirty = 4.29e+6
Innodb_buffer_pool_pages_misc = 2.15e+6
Innodb_buffer_pool_pages_total = 3.62e+7
Innodb_data_fsyncs = 132 /sec
Innodb_data_writes = 232 /sec
Innodb_data_written = 5440151 /sec
Innodb_dblwr_pages_written = 145 /sec
Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group = 582.3MB
Innodb_pages_written = 165 /sec
Innodb_row_lock_time = 5.97e+7
Innodb_rows_deleted + Innodb_rows_inserted = 2180 /sec
Innodb_rows_inserted = 2155 /sec
Innodb_rows_read = 1398531 /sec
Max_used_connections = 701
Open_tables = 10,000
Select_full_range_join = 2.57e+7
Select_range = 130 /sec
Sort_range = 30 /sec
Sort_scan = 332 /sec
Table_open_cache_hits = 9354 /sec
Threads_running = 71
eq_range_index_dive_limit = 200
innodb_purge_threads = 4
innodb_thread_sleep_delay = 16,925
Best Answer
See this bug report, where others appear to have the same issue:
https://bugs.mysql.com/bug.php?id=83047
Note that the reports here mention Linux, but I use FreeBSD, so it does not appear to be an OS-specific issue.
The suggested workaround is to try a third party malloc implementation, such as jemalloc, or tcmalloc.
FreeBSD already uses jemalloc as its default allocator (although I was unable to confirm 100% that the mysqld executable I have was actually linked against it), so I tried installing and configuring tcmalloc. (See https://github.com/gperftools/gperftools for source; your OS may have it available as a package.)
An hour after restart, allocated memory for the mysqld process was at 5954M.
Now, 31 hours after restart, memory is at 6037M, which is an increase of just over 1%. In addition, that value has not changed for the past 15 hours, which suggests that the mysqld process is in a state where all future allocations can be satisfied internally.
It's alarming that somewhere between 5.6 and 5.7, something in MySQL changed that causes the default malloc implementation to go off the charts, and a third party library is required to fix. I cannot find any official acknowledgement of this bug.
Hope this answer helps anyone else experiencing this very frustrating problem.