MySQL replica is lagging without resource saturation

amazon-rdsMySQLmysql-5.6replication

I am running MySQL 5.6.41 on AWS RDS. I have one multi-AZ master and one multi-AZ read replica. It is a vanilla configuration MySQL configuration, so the binlog is in MIXED format and the replication process on the replica is single threaded.

When I attach a new replica, it is booted with ~1 hour of lag, and it takes 10 hours to catch up. When I turn the binlog format to ROW on the replica, the lag starts increasing indefinitely if sync_binlog=1.

I have tracked this down to the SQL_THREAD in particular on the replica that is struggling to keep up.

I'm trying to work out why replication is so slow, given that no system resources appear to be saturated.

The write IOPS on the master does not exceed 2,500/s during short-lived spikes. The average is more like 470/s. Read IOPS are very low: no more than 100/s. We have a baseline capacity of 2,700/s IOPS because we have a 900GiB General Purpose SSD. See AWS docs on this. I also don't think the disk is the bottleneck because the DiskQueueDepth doesn't exceed 1.

I don't believe the issue is saturating the network because the IO_THREAD on the replica is keeping up with the master.

The CPU and memory usage on the replica seem ok. CPU doesn't go above 8% utilization on an 8 vCPU instance, and it has 8 GiB freeable memory.

Does anyone have any ideas for how I can track this down further?

Best Answer

Single-threading is very likely to be the cause of getting "behind".

A Master can execute multiple transactions at the same time. But a single-threaded Slave is limited to one transaction at a time. This is the old, simple, way of doing Replication.

Newer replication techniques are more complex. Depending on what version of MySQL you are running, multi-threading may be limited to "one thread per database", thereby not involving much extra complexity.

(I, too, need at least A and B, plus RAM size.)

Find some of the slowest DML queries; there may be ways to speed them up. Also, do you have a lot of Reads on the Replica? If so, it could be adding to the sluggishness.

Analysis

Observations:

  • Version: 5.6.41-log
  • 30 GB of RAM
  • Uptime = 6d 07:36:37
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Increase innodb_io_capacity, currently 200, to 1000. (Check AWS documentation on this topic.)

Assuming you have SSDs, turn off innodb_flush_neighbors.

Is some other process purging binlogs? (cf expire_logs_days = 0)

Ignoring errors seems dangerous. (cf binlog_error_action)

The traffic heavily leaning toward INSERTs; is this normal? What do the INSERTs look like? Single-row inserts? Multi-row? Other? We may be able to tweak them to avoid some of the slave lag.

Related to that, please discuss the apparent 275 transactions per second.

Since the main thrust of the Question is about slave lag, let's also approach this from the viewpoint of "slow queries". Set long_query_time=1 and turn on the Slow Log. (AWS may have a web page laying out details.) Do this for both the Master and the slave. I prefer these ancillary settings, especially for the "slave lag" situation:

log_queries_not_using_indexes = OFF
log_slow_admin_statements = ON
log_slow_slave_statements = ON

Details and other observations:

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((3078624 + 150107660) ) / 545797 = 280 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size (now 22548578304)?

( Innodb_buffer_pool_pages_flushed ) = 150,107,660 / 545797 = 275 /sec -- Writes (flushes) -- Increase innodb_buffer_pool_size (now 22548578304)?

( Innodb_os_log_written ) = 318,630,569,472 / 545797 = 583789 /sec -- This is an indicator of how busy InnoDB is. -- Very busy InnoDB.

( Innodb_log_writes ) = 150,500,027 / 545797 = 275 /sec

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 318,630,569,472 / (545797 / 3600) / 2 / 128M = 7.83 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 545,797 / 60 * 128M / 318630569472 = 3.83 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 134217728). (Cannot change in AWS.)

( Innodb_dblwr_writes ) = 6,573,796 / 545797 = 12 /sec -- "Doublewrite buffer" writes to disk. "Doublewrites" are a reliability feature. Some newer versions / configurations don't need them. -- (Symptom of other issues)

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 4568524 + 150107660 ) / 545797 / 200 = 141.7% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity (now 200) if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 2,496 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 2496) is too high and various memory settings are high, you could run out of RAM.

( character_set_server ) = character_set_server = latin1 -- Charset problems may be helped by setting character_set_server (now latin1) to utf8mb4. That is the future default. Caution: there can be index definition problems with utf8mb4 (but utf8 does not have these problems).

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( (Queries-Questions)/Queries ) = (303268906-1169727)/303268906 = 99.6% -- Fraction of queries that are inside Stored Routines. -- (Not bad if high; but it impacts the validity of some other conclusions.)

( Created_tmp_disk_tables ) = 570,444 / 545797 = 1 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 16777216) and max_heap_table_size (now 16777216). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( Created_tmp_disk_tables / Questions ) = 570,444 / 1169727 = 48.8% -- Pct of queries that needed on-disk tmp table. -- Better indexes / No blobs / etc.

( Created_tmp_disk_tables / Created_tmp_tables ) = 570,444 / 729780 = 78.2% -- Percent of temp tables that spilled to disk -- Maybe increase tmp_table_size (now 16777216) and max_heap_table_size (now 16777216); improve indexes; avoid blobs, etc.

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (150283122 + 50516 + 10949 + 0) / 150328440 = 1 -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( Select_scan ) = 778,039 / 545797 = 1.4 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Select_scan / Com_select ) = 778,039 / 1129934 = 68.9% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (150283122 + 10949 + 0 + 0 + 50516 + 0) / 545797 = 275 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( binlog_error_action ) = binlog_error_action = IGNORE_ERROR -- What to do if the binlog cannot be written. -- IGNORE_ERROR is the default for backwards compatibility, but ABORT_SERVER is recommended.

( expire_logs_days ) = 0 -- How soon to automatically purge binlog (after this many days) -- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash. (Not relevant if log_bin (now ON) = OFF)

( innodb_autoinc_lock_mode ) = 1 -- Galera: desires 2 -- 2 = "interleaved"; 1 = "consecutive" is typical; 0 = "traditional". -- Galera desires 2; 2 requires BINLOG_FORMAT=ROW or MIXED

( slow_query_log ) = slow_query_log = OFF -- Whether to log slow queries. (5.1.12)

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( log_slow_slave_statements ) = log_slow_slave_statements = OFF -- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Slave reads.

( Aborted_connects / Connections ) = 2,404 / 4995 = 48.1% -- Perhaps a hacker is trying to break in? (Attempts to connect)

( thread_cache_size / Max_used_connections ) = 32 / 17 = 188.2% -- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.

You have the Query Cache half-off. You should set both query_cache_type = OFF and query_cache_size = 0 . There is (according to a rumor) a 'bug' in the QC code that leaves some code on unless you turn off both of those settings.

Abnormally small:

(Com_select + Qcache_hits) / (Com_insert + Com_update + Com_delete + Com_replace) = 0.00752

Abnormally large:

Binlog_cache_use = 275 /sec
Com_begin = 275 /sec
Com_commit = 275 /sec
Com_commit + Com_rollback = 275 /sec
Com_flush = 12 /HR
Com_insert = 275 /sec
Com_kill = 3 /HR
Com_purge = 12 /HR
Com_show_grants = 0.66 /HR
Com_show_slave_status = 60 /HR
Com_slave_start = 0.0066 /HR
Connection_errors_peer_address = 0.0066 /HR
Handler_commit = 1377 /sec
Handler_commit/Questions = 642
Handler_delete = 330 /sec
Handler_prepare = 1101 /sec
Innodb_buffer_pool_pages_misc = 135,294
Innodb_buffer_pool_pages_misc * 16384 / innodb_buffer_pool_size = 9.8%
Innodb_buffer_pool_write_requests = 9230 /sec
Innodb_data_fsyncs = 300 /sec
Innodb_data_writes = 563 /sec
Innodb_data_writes - Innodb_log_writes - Innodb_dblwr_writes = 275 /sec
Innodb_data_written = 9595901 /sec
Innodb_dblwr_pages_written = 275 /sec
Innodb_os_log_fsyncs = 276 /sec
Innodb_os_log_pending_fsyncs = 1
Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group = 1,002.1MB
Innodb_pages_written = 275 /sec
Innodb_rows_deleted = 330 /sec
Innodb_rows_deleted + Innodb_rows_inserted = 669 /sec
Prepared_stmt_count = 4
Ssl_accepts = 1,019
Ssl_finished_accepts = 1,019
Ssl_session_cache_overflows = 340
Ssl_used_session_cache_entries = 128
performance_schema_max_cond_instances = 12,884
performance_schema_max_socket_instances = 5,012
performance_schema_max_thread_instances = 5,092

Abnormal strings:

Slave_running = ON
innodb_fast_shutdown = 1
log_output = TABLE
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
read_only = ON
relay_log_recovery = ON
slave_rows_search_algorithms = TABLE_SCAN,INDEX_SCAN
time_zone = UTC