Mysql – MariaDB Insert queries stuck frequently

lockingmariadbMySQLoptimization

I am currently doing high insertions into my MariaDB 10.4.12 database via scripts.

However, sometimes the insertion queries just get stuck/piled up at what it seems to be at the Update or Commit state.

Queries getting stuck

The above insertions that I am mostly doing (95% of the time) are INSERT IGNORE INTO table1 ('b1','c1','a1') VALUES (?,?,?)

Sometimes, they can even go up to 300+ seconds, until eventually all the insertions queries slowly get cleared up within 5-10 seconds (it seems like something is stuck)

I have thought of few things to improve the insert operations :

  1. Do BULK insertions instead of many single insertions
  2. Use transactions
  3. Increase innodb_log_file_size

However, ontop of doing that, I would like to know if this is due to other reasons.

When I checked my iotop , half of the time it shows that mysqld is having 99.99% I/O

I am not sure if the disks are failing (2TB x4 in mirror setup) or is there some configuration settings that I did not tuned properly?

Table schema :

table1 (
  a1 bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  b1 char(64) DEFAULT NULL,
  c1 bigint(20) unsigned DEFAULT NULL,
  d1 datetime DEFAULT NULL,
  e1 tinyint(1) DEFAULT NULL,
  f1 int(11) DEFAULT NULL,
  LastUpdate timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  PRIMARY KEY (a1),
  UNIQUE KEY idxc1b1 (c1,b1)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

my.cnf :

[mysqld]
symbolic-links=0

innodb_thread_concurrency=32
innodb_buffer_pool_size=240G
innodb_buffer_pool_instances=64
innodb_page_cleaners=32
innodb_purge_threads=1
innodb_read_io_threads=64
innodb_write_io_threads=64
innodb_use_native_aio=0
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_autoinc_lock_mode=2
innodb_file_per_table=1

max_connections=10000
skip_name_resolve=1
tmp_table_size=134217728
max_heap_table_size=134217728
back_log=1000
wait_timeout=900
innodb_log_buffer_size=32M
innodb_log_file_size=768M
open_files_limit=1024000
max_allowed_packet=512M

Update #1 : Added SHOW GLOBAL STATUS and SHOW ENGINE INNODB STATUS during queries piled up (stuck for 120s+)

GLOBAL STATUS : https://justpaste.it/3hxey

INNODB STATUS : https://justpaste.it/5uren

Update #2 : Added SHOW VARIABLES

SHOW VARIABLES : https://justpaste.it/3chha

Update #3 : After replacing the disks (which are also SSDs), the issue is solved. This means that choosing the right SSD disk model is very important for a database high insertion/write rate

Best Answer

Analysis of STATUS and VARIABLES

Observations:

  • Version: 10.4.12-MariaDB
  • 256 GB of RAM -- Is this correct??
  • Uptime = 52d 15:16:00
  • Are you sure this was a SHOW GLOBAL STATUS ?
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Increase innodb_io_capacity to 500.

Lower innodb_lru_scan_depth to 64

Why?: innodb_change_buffering = none Normally it is a good feature.

Lots of active threads; was the server melting down as you ran SHOW GLOBAL STATUS?

Com_admin_commands is huge! What admin commands are you doing? Possibly something right before or right after each INSERT? (About 300/sec for each)

80 queries per minute are taking more than 10 seconds. Turn on the slowlog, etc.

Are you using Galera?

Details and other observations:

( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) = (0 + 0 + 273050 + 951182390 + 0 + 951175516) / 4547760 = 418 /sec -- IOPs? -- If the hardware can handle it, set innodb_io_capacity (now 200) to about this value.

( ( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) / innodb_io_capacity / Uptime ) = ( 0 + 0 + 273050 + 951182390 + 0 + 951175516 ) / 200 / 4547760 = 209.2% -- This may be a metric indicating what innodb_io_capacity is set reasonably. -- Increase innodb_io_capacity (now 200) if the hardware can handle it.

( innodb_buffer_pool_instances ) = 64 -- For large RAM, consider using 1-16 buffer pool instances, not allowing less than 1GB each. Also, not more than, say, twice the number of CPU cores. -- Recommend no more than 16.

( innodb_lru_scan_depth * innodb_buffer_pool_instances ) = 1,024 * 64 = 65,536 -- A metric of CPU usage. -- Lower either number.

( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 32 = 32,768 -- Amount of work for page cleaners every second. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixable by lowering lru_scan_depth: Consider 1000 / innodb_page_cleaners (now 32). Also check for swapping.

( innodb_page_cleaners / innodb_buffer_pool_instances ) = 32 / 64 = 0.5 -- innodb_page_cleaners -- Recommend setting innodb_page_cleaners (now 32) to innodb_buffer_pool_instances (now 64)

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10 -- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)

( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((242249 + 951175516) ) / 4547760 = 209 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size (now 257698037760)?

( Innodb_buffer_pool_pages_flushed ) = 951,175,516 / 4547760 = 209 /sec -- Writes (flushes) -- Increase innodb_buffer_pool_size (now 257698037760)?

( innodb_change_buffering ) = innodb_change_buffering = none -- Pre-5.6.11 / 5.5.31, there was a bug that made ="changes" a safer option.

( innodb_doublewrite ) = innodb_doublewrite = OFF -- Extra I/O, but extra safety in crash. -- OFF is OK for FusionIO, Galera, Slaves, ZFS.

( Innodb_os_log_written ) = 1,867,005,211,648 / 4547760 = 410532 /sec -- This is an indicator of how busy InnoDB is. -- Very idle or very busy InnoDB.

( Innodb_log_writes ) = 1,029,874,951 / 4547760 = 226 /sec

( innodb_flush_method ) = innodb_flush_method = fsync -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( default_tmp_storage_engine ) = default_tmp_storage_engine =

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 273050 + 951182390 ) / 4547760 / 200 = 104.6% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity (now 200) if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 15,934 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 15934) is too high and various memory settings are high, you could run out of RAM.

( character_set_server ) = character_set_server = latin1 -- Charset problems may be helped by setting character_set_server (now latin1) to utf8mb4. That is the future default.

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( tmp_table_size ) = 128M -- Limit on size of MEMORY temp tables used to support a SELECT -- Decrease tmp_table_size (now 134217728) to avoid running out of RAM. Perhaps no more than 64M.

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (1371247297 + 4324897 + 1 + 0) / 1 = 1.38e+9 -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( Select_full_join / Com_select ) = 3,469 / 49002 = 7.1% -- % of selects that are indexless join -- Add suitable index(es) to tables used in JOINs.

( Select_scan / Com_select ) = 83,957 / 49002 = 171.3% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (1371247297 + 1 + 0 + 0 + 4324897 + 0) / 4547760 = 302 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( Com_admin_commands ) = 1,350,052,920 / 4547760 = 296 /sec -- Why so many DDL statements?

( Com_admin_commands / Queries ) = 1,350,052,920 / 1387441904 = 97.3% -- Percent of queries that are "admin" commands. -- What's going on?

( Com_set_option / Com_select ) = 3,505,412 / 49002 = 7153.6% -- It seems 'wrong' to do more SETs than SELECTs.

( Com__biggest ) = Com__biggest = Com_insert -- Which of the "Com_" metrics is biggest. -- Normally it is Com_select (now 49002). If something else, then it may be a sloppy platform, or may be something else.

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( slow_query_log ) = slow_query_log = OFF -- Whether to log slow queries. (5.1.12)

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Slow_queries ) = 5,897,189 / 4547760 = 1.3 /sec -- Frequency (Slow queries per sec) -- Rework slow guys; improve indexes; watch disk space for slow log file

( Max_used_connections ) = 597 -- High-water mark for connections -- Lots of inactive connections is OK; over 100 active connections is likely to be a problem. Max_used_connections (now 597) does not distinguish them; Threads_running (now 299) is instantaneous.

( Threads_running - 1 ) = 299 - 1 = 298 -- Active threads (concurrency when data collected) -- Optimize queries and/or schema

( thread_pool_size ) = 40 -- Number of 'thread groups'. Limits how many treads can be executing at once. Probably should not be much bigger than the number of CPUs. -- Don't set much higher than the number of CPU cores.

You have the Query Cache half-off. You should set both query_cache_type = OFF and query_cache_size = 0 . There is (according to a rumor) a 'bug' in the QC code that leaves some code on unless you turn off both of those settings.

Abnormally small:

(Com_select + Qcache_hits) / (Com_insert + Com_update + Com_delete + Com_replace) = 3.6e-5
Aria_pagecache_blocks_unused = 15,687
Com_select = 39 /HR
Created_tmp_files = 0.0032 /HR
Empty_queries = 14 /HR
Handler_icp_attempts = 0.0024 /HR
Handler_icp_match = 0.0024 /HR
Handler_read_first = 0.066 /HR
Handler_read_rnd = 0.0071 /HR
Innodb_buffer_pool_reads * innodb_page_size / innodb_buffer_pool_size = 1.5%
Innodb_dblwr_pages_written = 0
Innodb_num_open_files = 6
Memory_used = 0.13%
Select_range = 0.0063 /HR
Sort_priority_queue_sorts = 0
Sort_rows = 0.078 /HR
Sort_scan = 0.013 /HR
innodb_spin_wait_delay = 4
lock_wait_timeout = 86,400
table_open_cache / max_connections = 0.126

Abnormally large:

Com_do = 0.0024 /HR
Com_insert = 301 /sec
Feature_json = 0.07 /sec
Handler_discover = 0.041 /HR
Handler_read_next / Handler_read_key = 100
Innodb_buffer_pool_pages_data = 1.53e+7
Innodb_buffer_pool_pages_dirty = 410,202
Innodb_buffer_pool_pages_flushed / max(Questions, Queries) = 0.686
Innodb_buffer_pool_pages_total = 1.54e+7
Innodb_data_pending_writes = 32
Innodb_data_writes = 435 /sec
Innodb_data_writes - Innodb_log_writes - Innodb_dblwr_writes = 209 /sec
Innodb_os_log_pending_fsyncs = 1
Innodb_os_log_pending_writes = 1
Innodb_pages_written = 209 /sec
Threads_connected = 304
Threads_running = 299
histogram_size = 254
host_cache_size = 1,103
innodb_page_cleaners = 32
innodb_read_io_threads = 64
innodb_thread_concurrency = 32
innodb_thread_sleep_delay = 48,937
innodb_write_io_threads = 64
max_long_data_size = 512MB
max_relay_log_size = 1024MB
optimizer_use_condition_selectivity = 4
performance_schema_max_cond_classes = 90
performance_schema_max_stage_classes = 160

Abnormal strings:

aria_recover_options = BACKUP,QUICK
disconnect_on_expired_password = OFF
ft_boolean_syntax = + -><()~*:\"\"&
histogram_type = DOUBLE_PREC_HB
innodb_fast_shutdown = 1
innodb_use_atomic_writes = ON
innodb_use_native_aio = OFF
log_slow_admin_statements = ON
myisam_stats_method = NULLS_UNEQUAL
old_alter_table = DEFAULT
opt_s__optimize_join_buffer_size = on
optimizer_trace = enabled=off
plugin_maturity = gamma
sql_safe_updates = ON
use_stat_tables = PREFERABLY_FOR_QUERIES
wsrep_data_home_dir = /data01/mysql/
wsrep_debug = NONE
wsrep_load_data_splitting = OFF