I'm trying to track down some sporadic and difficult-to-reproduce issues with a MySQL 8.0.16 RDS instance connected to a web application with a write-heavy workload on a couple tables with large JSON columns. One table has ~3M rows, average row length is ~120kb, like 350GB of data total, almost all on the clustered index. The other has ~12M rows, average row length is ~20kb. The workload is like 80% INSERT and INSERT ON DUPLICATE KEY UPDATE, with some SELECTs and DELETEs. We don't run any JOINs or complicated SELECTs that could be bogging everything else down.
The issue is:
For brief periods, like 1-20 seconds, once or twice every couple weeks, all INSERT, IODKU, and DELETE queries appear to be totally unresponsive. Write latency spikes, dirty pages spike, network transmit (but not receive) spikes, everything shows up in the slow query log, and then the issue resolves itself and everything returns to normal. We have a heavily seasonal traffic pattern and this issue first appeared on day 2 of our busy season. I'm hoping to track it down before the last week of the season which will have a few days of very high, sustained traffic.
I have pt-stalk watching production and caught maybe a smaller version of one of these when Threads_running spiked up to ~150 (usually it's less than 10). Of course, since it's RDS we don't have the diskstat/iostat/netstat output or anything, and the CloudWatch Metric intervals are too wide to tell me anything, but have a lot of information from SHOW ENGINE INNODB STATUS
and SHOW GLOBAL STATUS
which I'm struggling to sift through. So far, the thing that stood out most was the File I/O part of the innodbstatus output during this Threads running spike:
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [27, 25, 21, 20] ,
ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 1; buffer pool: 6251
138942220 OS file reads, 8230899625 OS file writes, 1653263701 OS fsyncs
5.60 reads/s, 16384 avg bytes/read, 1975.05 writes/s, 215.99 fsyncs/s
Most of the time we have 0 pending aio writes so that stood out to me.
Most of our innodb parameters are at default levels. I've tried bumping innodb_log_file_size
to 10GB (since we write about 287 MB/min according to polling changes in Innodb_os_log_written and everybody says we should have an hour of redo) but that doesn't seem to have any effect when I run performance tests in a non-production environment. Production is still at the default for our version of MySQL (128MB).
Other important settings:
innodb_buffer_pool_size 48318382080
innodb_flush_log_at_trx_commit 1
innodb_flush_method O_DIRECT
innodb_flush_neighbors 0
innodb_io_capacity 200
innodb_io_capacity_max 2000
innodb_log_buffer_size 8388608
innodb_max_dirty_pages_pct 90.000000
innodb_max_dirty_pages_pct_lwm 10.000000
RDS is specced out like so:
64 GB RAM
16 vCPU
4096 GiB General Purpose (SSD)
To try and come up with a concrete question:
- Is it normal/expected to see pending aio writes like in that INNODB STATUS output?
- We got up to like 8000 write IOPS according to CloudWatch when the incidents occurred — does that mean innodb_io_capacity_max is too low at 2000?
- Do RDS specs look appropriate?
- What can I look at to pin this down further as either a configuration issue, a hardware issue with the RDS SSD, or a spec issue with the instance?
I appreciate your time and patience, as a humble application developer it's hard to know if I'm barking up the right tree or even in the right forest.
EDIT 1:
Adding show create for our largest table and obfuscated example of our most common query:
SHOW CREATE TABLE A
:
CREATE TABLE `A` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`z` json DEFAULT NULL,
`x` json DEFAULT NULL,
`createdAt` datetime NOT NULL,
`updatedAt` datetime NOT NULL,
`y` varchar(255) GENERATED ALWAYS AS (json_unquote(json_extract(`z`,_utf8mb4'$.nested.other_id'))) VIRTUAL,
`m` enum('about', '10', 'different', 'string', 'values') GENERATED ALWAYS AS (json_unquote(json_extract(`z`,_utf8mb4'$.nested.enum'))) VIRTUAL,
`n` json DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `y` (`y`),
KEY `m` (`m`)
) ENGINE=InnoDB AUTO_INCREMENT=31111111 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_c
INSERT INTO `A` \
(`id`,`z`,`x`,`n`,`createdAt`,`updatedAt`) VALUES \
('12354678', \
'a big json object, usually ~40-70kb, probably could be up to 250kb', \
'a big json object almost identical to the previous column', \
'a smaller json object', \
'2020-12-02 18:30:13', \
'2020-12-02 18:30:13') ON DUPLICATE KEY UPDATE \
`id`=VALUES(`id`), `z`=VALUES(`z`), `x`=VALUES(`x`), `n`=VALUES(`n`), `updatedAt`=VALUES(`updatedAt`)
Best Answer
Analysis of GLOBAL STATUS and VARIABLES:
Observations:
The More Important Issues:
Increase binlog_cache_size to 4M.
Increase innodb_log_file_size to 2G.
Increase innodb_io_capacity to 1000. (Maybe more; I can't tell.)
Increase bulk_insert_buffer_size to 500M.
Increase host_cache_size to 1000.
Consider using DEACTIVATE PREPARE.
Turn OFF log_queries_not_using_indexes; it is mostly clutter.
Max_used_connections has reached 962; this may have lead to other problems.
If you are not using Replication and not saving the binlogs for anything else, consider turning log_bin OFF; this would save a bunch of I/O.
Unusually high number of CREATE/ALTER/DROP/PURGE EVENT/TRIGGER/FUNCTION/PROCEDURE/SAVEPOINT. Though their frequency is probably significant, I question the schema design when such commands are used heavily.
I don't think this is valid:
character_set_filesystem = utf8mb4
It is normally "binary". I worry that the "filessystem" may not allow such. Were other charset things changed?Details and other observations:
( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) = (0 + 0 + 138942227 + 2025439216 + 148032874 + 2025439206) / 16251014 = 266 /sec
-- IOPs? -- If the hardware can handle it, set innodb_io_capacity (now 200) to about this value.( ( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) / innodb_io_capacity / Uptime ) = ( 0 + 0 + 138942227 + 2025439216 + 148032874 + 2025439206 ) / 200 / 16251014 = 133.5%
-- This may be a metric indicating what innodb_io_capacity is set reasonably. -- Increase innodb_io_capacity (now 200) if the hardware can handle it.( table_open_cache ) = 4,000
-- Number of table descriptors to cache -- Several hundred is usually good.( Binlog_cache_disk_use / Binlog_cache_use ) = 222,113,817 / 451574927 = 49.2%
-- Spill to disk -- Increase binlog_cache_size (now 32768)( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096
-- Amount of work for page cleaners every second. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixable by lowering lru_scan_depth: Consider 1000 / innodb_page_cleaners (now 4). Also check for swapping.( innodb_page_cleaners / innodb_buffer_pool_instances ) = 4 / 8 = 0.5
-- innodb_page_cleaners -- Recommend setting innodb_page_cleaners (now 4) to innodb_buffer_pool_instances (now 8)( innodb_lru_scan_depth ) = 1,024
-- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth( Innodb_buffer_pool_pages_flushed ) = 2,025,439,206 / 16251014 = 124 /sec
-- Writes (flushes) -- Increase innodb_buffer_pool_size (now 48318382080)?( Innodb_os_log_written ) = 19,211,214,969,856 / 16251014 = 1182154 /sec
-- This is an indicator of how busy InnoDB is. -- Very busy InnoDB.( Innodb_log_writes ) = 6,037,953,736 / 16251014 = 371 /sec
( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 19,211,214,969,856 / (16251014 / 3600) / 2 / 128M = 15.9
-- Ratio -- (see minutes)( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 16,251,014 / 60 * 128M / 19211214969856 = 1.89
-- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 134217728). (Cannot change in AWS.)( Innodb_dblwr_writes ) = 148,032,874 / 16251014 = 9.1 /sec
-- "Doublewrite buffer" writes to disk. "Doublewrites" are a reliability feature. Some newer versions / configurations don't need them. -- (Symptom of other issues)( innodb_io_capacity ) = 200
-- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF
-- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.( max_connections ) = 5,230
-- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 5230) is too high and various memory settings are high, you could run out of RAM.( local_infile ) = local_infile = ON
-- local_infile (now ON) = ON is a potential security issue( bulk_insert_buffer_size ) = 8 / 65536M = 0.01%
-- Buffer for multi-row INSERTs and LOAD DATA -- Too big could threaten RAM size. Too small could hinder such operations.( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (367369323 + 569825858 + 4993533 + 0) / 423300126 = 2.23
-- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (367369323 + 4993533 + 0 + 0 + 569825858 + 0) / 16251014 = 58 /sec
-- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives( ( Com_stmt_prepare - Com_stmt_close ) / ( Com_stmt_prepare + Com_stmt_close ) ) = ( 129325937 - 0 ) / ( 129325937 + 0 ) = 100.0%
-- Are you closing your prepared statements? -- Add Closes.( Com_stmt_prepare - Com_stmt_close ) = 129,325,937 - 0 = 1.29e+8
-- How many prepared statements have not been closed. -- CLOSE prepared statements( Com_stmt_close / Com_stmt_prepare ) = 0 / 129325937 = 0
-- Prepared statements should be Closed. -- Check whether all Prepared statements are "Closed".( binlog_format ) = binlog_format = MIXED
-- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)( expire_logs_days ) = 0
-- How soon to automatically purge binlog (after this many days). Being replaced by binlog_expire_logs_seconds. -- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash. (Not relevant if log_bin (now ON) = OFF)( log_slow_slave_statements ) = log_slow_slave_statements = OFF
-- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Replica reads.( log_queries_not_using_indexes ) = log_queries_not_using_indexes = ON
-- Whether to include such in slowlog. -- This clutters the slowlog; turn it off so you can see the real slow queries. And decrease long_query_time (now 1) to catch most interesting queries.( Max_used_connections ) = 962
-- High-water mark for connections -- Lots of inactive connections is OK; over 100 active connections is likely to be a problem. Max_used_connections (now 962) does not distinguish them; Threads_running (now 5) is instantaneous.( Max_used_connections / host_cache_size ) = 962 / 864 = 111.3%
-- Increase host_cache_size (now 864)Abnormally small:
Abnormally large:
Abnormal strings:
From Comments Some comments on the Comments:
binlog_cache_size
was inadequate.innodb_io_capacity
may not make any difference.innodb_log_file_size
may be leading to the "stalls" you are experiencing.innodb_buffer_pool_size
is reasonable for the RAM size. (You can't raise it much.) I don't think you need to go to bigger VM.