Mysql – Investigating RDS MySQL 8.0.16 stalls on write-heavy workload

MySQLmysql-8.0performance-tuning

I'm trying to track down some sporadic and difficult-to-reproduce issues with a MySQL 8.0.16 RDS instance connected to a web application with a write-heavy workload on a couple tables with large JSON columns. One table has ~3M rows, average row length is ~120kb, like 350GB of data total, almost all on the clustered index. The other has ~12M rows, average row length is ~20kb. The workload is like 80% INSERT and INSERT ON DUPLICATE KEY UPDATE, with some SELECTs and DELETEs. We don't run any JOINs or complicated SELECTs that could be bogging everything else down.

The issue is:

For brief periods, like 1-20 seconds, once or twice every couple weeks, all INSERT, IODKU, and DELETE queries appear to be totally unresponsive. Write latency spikes, dirty pages spike, network transmit (but not receive) spikes, everything shows up in the slow query log, and then the issue resolves itself and everything returns to normal. We have a heavily seasonal traffic pattern and this issue first appeared on day 2 of our busy season. I'm hoping to track it down before the last week of the season which will have a few days of very high, sustained traffic.

I have pt-stalk watching production and caught maybe a smaller version of one of these when Threads_running spiked up to ~150 (usually it's less than 10). Of course, since it's RDS we don't have the diskstat/iostat/netstat output or anything, and the CloudWatch Metric intervals are too wide to tell me anything, but have a lot of information from SHOW ENGINE INNODB STATUS and SHOW GLOBAL STATUS which I'm struggling to sift through. So far, the thing that stood out most was the File I/O part of the innodbstatus output during this Threads running spike:

Pending normal aio reads: [0, 0, 0, 0] , aio writes: [27, 25, 21, 20] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 1; buffer pool: 6251
138942220 OS file reads, 8230899625 OS file writes, 1653263701 OS fsyncs
5.60 reads/s, 16384 avg bytes/read, 1975.05 writes/s, 215.99 fsyncs/s

Most of the time we have 0 pending aio writes so that stood out to me.

Most of our innodb parameters are at default levels. I've tried bumping innodb_log_file_size to 10GB (since we write about 287 MB/min according to polling changes in Innodb_os_log_written and everybody says we should have an hour of redo) but that doesn't seem to have any effect when I run performance tests in a non-production environment. Production is still at the default for our version of MySQL (128MB).

Other important settings:

innodb_buffer_pool_size 48318382080
innodb_flush_log_at_trx_commit  1
innodb_flush_method O_DIRECT
innodb_flush_neighbors  0
innodb_io_capacity  200
innodb_io_capacity_max  2000
innodb_log_buffer_size  8388608
innodb_max_dirty_pages_pct  90.000000
innodb_max_dirty_pages_pct_lwm  10.000000

RDS is specced out like so:

64 GB RAM
16 vCPU
4096 GiB General Purpose (SSD)

To try and come up with a concrete question:

  1. Is it normal/expected to see pending aio writes like in that INNODB STATUS output?
  2. We got up to like 8000 write IOPS according to CloudWatch when the incidents occurred — does that mean innodb_io_capacity_max is too low at 2000?
  3. Do RDS specs look appropriate?
  4. What can I look at to pin this down further as either a configuration issue, a hardware issue with the RDS SSD, or a spec issue with the instance?

I appreciate your time and patience, as a humble application developer it's hard to know if I'm barking up the right tree or even in the right forest.

EDIT 1:

Adding show create for our largest table and obfuscated example of our most common query:

SHOW CREATE TABLE A:

CREATE TABLE `A` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `z` json DEFAULT NULL,
  `x` json DEFAULT NULL,
  `createdAt` datetime NOT NULL,
  `updatedAt` datetime NOT NULL,
  `y` varchar(255) GENERATED ALWAYS AS (json_unquote(json_extract(`z`,_utf8mb4'$.nested.other_id'))) VIRTUAL,
  `m` enum('about', '10', 'different', 'string', 'values') GENERATED ALWAYS AS (json_unquote(json_extract(`z`,_utf8mb4'$.nested.enum'))) VIRTUAL,
  `n` json DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `y` (`y`),
  KEY `m` (`m`)
) ENGINE=InnoDB AUTO_INCREMENT=31111111 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_c
INSERT INTO `A` \
  (`id`,`z`,`x`,`n`,`createdAt`,`updatedAt`) VALUES \
  ('12354678', \
   'a big json object, usually ~40-70kb, probably could be up to 250kb', \
   'a big json object almost identical to the previous column', \
   'a smaller json object', \
   '2020-12-02 18:30:13', \
   '2020-12-02 18:30:13') ON DUPLICATE KEY UPDATE \
   `id`=VALUES(`id`), `z`=VALUES(`z`), `x`=VALUES(`x`), `n`=VALUES(`n`), `updatedAt`=VALUES(`updatedAt`)

Best Answer

Analysis of GLOBAL STATUS and VARIABLES:

Observations:

  • Version: 8.0.16
  • 64 GB of RAM
  • Uptime = 188d 02:10:14
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Increase binlog_cache_size to 4M.

Increase innodb_log_file_size to 2G.

Increase innodb_io_capacity to 1000. (Maybe more; I can't tell.)

Increase bulk_insert_buffer_size to 500M.

Increase host_cache_size to 1000.

Consider using DEACTIVATE PREPARE.

Turn OFF log_queries_not_using_indexes; it is mostly clutter.

Max_used_connections has reached 962; this may have lead to other problems.

If you are not using Replication and not saving the binlogs for anything else, consider turning log_bin OFF; this would save a bunch of I/O.

Unusually high number of CREATE/ALTER/DROP/PURGE EVENT/TRIGGER/FUNCTION/PROCEDURE/SAVEPOINT. Though their frequency is probably significant, I question the schema design when such commands are used heavily.

I don't think this is valid: character_set_filesystem = utf8mb4 It is normally "binary". I worry that the "filessystem" may not allow such. Were other charset things changed?

Details and other observations:

( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) = (0 + 0 + 138942227 + 2025439216 + 148032874 + 2025439206) / 16251014 = 266 /sec -- IOPs? -- If the hardware can handle it, set innodb_io_capacity (now 200) to about this value.

( ( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) / innodb_io_capacity / Uptime ) = ( 0 + 0 + 138942227 + 2025439216 + 148032874 + 2025439206 ) / 200 / 16251014 = 133.5% -- This may be a metric indicating what innodb_io_capacity is set reasonably. -- Increase innodb_io_capacity (now 200) if the hardware can handle it.

( table_open_cache ) = 4,000 -- Number of table descriptors to cache -- Several hundred is usually good.

( Binlog_cache_disk_use / Binlog_cache_use ) = 222,113,817 / 451574927 = 49.2% -- Spill to disk -- Increase binlog_cache_size (now 32768)

( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096 -- Amount of work for page cleaners every second. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixable by lowering lru_scan_depth: Consider 1000 / innodb_page_cleaners (now 4). Also check for swapping.

( innodb_page_cleaners / innodb_buffer_pool_instances ) = 4 / 8 = 0.5 -- innodb_page_cleaners -- Recommend setting innodb_page_cleaners (now 4) to innodb_buffer_pool_instances (now 8)

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_buffer_pool_pages_flushed ) = 2,025,439,206 / 16251014 = 124 /sec -- Writes (flushes) -- Increase innodb_buffer_pool_size (now 48318382080)?

( Innodb_os_log_written ) = 19,211,214,969,856 / 16251014 = 1182154 /sec -- This is an indicator of how busy InnoDB is. -- Very busy InnoDB.

( Innodb_log_writes ) = 6,037,953,736 / 16251014 = 371 /sec

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 19,211,214,969,856 / (16251014 / 3600) / 2 / 128M = 15.9 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 16,251,014 / 60 * 128M / 19211214969856 = 1.89 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 134217728). (Cannot change in AWS.)

( Innodb_dblwr_writes ) = 148,032,874 / 16251014 = 9.1 /sec -- "Doublewrite buffer" writes to disk. "Doublewrites" are a reliability feature. Some newer versions / configurations don't need them. -- (Symptom of other issues)

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 5,230 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 5230) is too high and various memory settings are high, you could run out of RAM.

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( bulk_insert_buffer_size ) = 8 / 65536M = 0.01% -- Buffer for multi-row INSERTs and LOAD DATA -- Too big could threaten RAM size. Too small could hinder such operations.

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (367369323 + 569825858 + 4993533 + 0) / 423300126 = 2.23 -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (367369323 + 4993533 + 0 + 0 + 569825858 + 0) / 16251014 = 58 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( ( Com_stmt_prepare - Com_stmt_close ) / ( Com_stmt_prepare + Com_stmt_close ) ) = ( 129325937 - 0 ) / ( 129325937 + 0 ) = 100.0% -- Are you closing your prepared statements? -- Add Closes.

( Com_stmt_prepare - Com_stmt_close ) = 129,325,937 - 0 = 1.29e+8 -- How many prepared statements have not been closed. -- CLOSE prepared statements

( Com_stmt_close / Com_stmt_prepare ) = 0 / 129325937 = 0 -- Prepared statements should be Closed. -- Check whether all Prepared statements are "Closed".

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( expire_logs_days ) = 0 -- How soon to automatically purge binlog (after this many days). Being replaced by binlog_expire_logs_seconds. -- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash. (Not relevant if log_bin (now ON) = OFF)

( log_slow_slave_statements ) = log_slow_slave_statements = OFF -- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Replica reads.

( log_queries_not_using_indexes ) = log_queries_not_using_indexes = ON -- Whether to include such in slowlog. -- This clutters the slowlog; turn it off so you can see the real slow queries. And decrease long_query_time (now 1) to catch most interesting queries.

( Max_used_connections ) = 962 -- High-water mark for connections -- Lots of inactive connections is OK; over 100 active connections is likely to be a problem. Max_used_connections (now 962) does not distinguish them; Threads_running (now 5) is instantaneous.

( Max_used_connections / host_cache_size ) = 962 / 864 = 111.3% -- Increase host_cache_size (now 864)

Abnormally small:

Select_range = 0.02 /HR
Select_range / Com_select = 0.00%
gtid_executed_compression_period = 0.22 /HR
information_schema_stats_expiry = 60

Abnormally large:

Bytes_received = 1358576 /sec
Com_alter_event = 0.00022 /HR
Com_alter_user = 0.00066 /HR
Com_begin = 26 /sec
Com_create_function = 0.0049 /HR
Com_create_procedure = 0.0058 /HR
Com_create_trigger = 0.00066 /HR
Com_create_view = 0.044 /HR
Com_drop_function = 0.0049 /HR
Com_drop_procedure = 0.0058 /HR
Com_drop_trigger = 0.00022 /HR
Com_drop_view = 0.044 /HR
Com_flush = 13 /HR
Com_purge = 4.1 /HR
Com_purge_before_date = 7.9 /HR
Com_savepoint = 21 /sec
Com_show_engine_mutex = 0.0013 /HR
Com_stmt_execute = 58 /sec
Created_tmp_files = 0.35 /sec
Innodb_buffer_pool_pages_data = 2.94e+6
Innodb_buffer_pool_pages_total = 2.95e+6
Innodb_data_pending_fsyncs = 6,252
Innodb_data_writes = 506 /sec
Innodb_data_written = 5779458 /sec
Innodb_os_log_fsyncs = 72 /sec
Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group = 2,029.3MB
Innodb_pages_created = 7.7 /sec
Ongoing_anonymous_transaction_count = 2
Prepared_stmt_count = 437
Ssl_accepts = 1.09e+7
Ssl_default_timeout = 7,200
Ssl_finished_accepts = 1.09e+7
Ssl_session_cache_misses = 1.09e+7
Ssl_session_cache_overflows = 106
Ssl_used_session_cache_entries = 127
Ssl_verify_depth = 1.84e+19
Ssl_verify_mode = 5
back_log / max_connections = 100.0%
host_cache_size = 864
innodb_max_dirty_pages_pct_lwm = 10
innodb_undo_tablespaces = 2
max_error_count = 1,024
max_length_for_sort_data = 4,096
max_prepared_stmt_count = 1.05e+6

Abnormal strings:

character_set_filesystem = utf8mb4
event_scheduler = ON
gtid_mode = OFF_PERMISSIVE
have_query_cache = NO
innodb_data_home_dir = /rdsdbdata/db/innodb
innodb_fast_shutdown = 1
log_bin_trust_function_creators = ON
log_statements_unsafe_for_binlog = OFF
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
relay_log_recovery = ON
slave_exec_mode = IDEMPOTENT
slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN
time_zone = UTC

From Comments Some comments on the Comments:

  • "average row length is ~120kb" -- That is quite high, so I and not surprised that the default for the default binlog_cache_size was inadequate.
  • There is not much I/O, so changing innodb_io_capacity may not make any difference.
  • I don't know what constitutes a "bulk_insert_buffer_size", so I don't know. The change is probably harmless, even if not helpful.
  • "1.89 Minutes between InnoDB log rotations" -- extremely rapid. Hence, the low value for innodb_log_file_size may be leading to the "stalls" you are experiencing.
  • A "a burst of concurrent queries" would hit the logs and other things.
  • innodb_buffer_pool_size is reasonable for the RAM size. (You can't raise it much.) I don't think you need to go to bigger VM.
  • Why is it frequently "ramping up" and yielding a "cold" buffer_pool? Such would hava a signifcant impact on latency (queries would run slower) and I/O (due to a lot of reads refilling the buffer_pool cache).
  • Prepared statements (I think) come only from Stored routines. Strange numbers like yours are something of a mystery to me; I am at a loss to give you specific advice, esspecially if you don't have any routines. Perhaps the monitoring has some SPs?