Mysql – InnoDB – Intermittent CPU spikes on a large database

innodbMySQLmysql-5.6

I have a machine with a large MySQL 5.6 database (multiple tables with 10-100 million rows). It has a considerable amount of load, especially during the evening and is running on a 16 core machine.

No matter what the load is though, we always get these intermittent spikes that can cause problems. Our CPU load looks like this:

CPU Load

(these are very load-light times, especially the time before 6am, should be basically "idle")

The only solution to the problem I have found so far is setting up a new server, mirroring the data and switching to that one. This usually buys me about 2-3 months, then the spikes start appearing again. Just restarting MySQL or rebooting the server does not change anything.

These are also not caused by cronjobs. Even if I disable all of them, this still happens.

Here is a gist of the InnoDB status right now:

https://gist.github.com/fleshgolem/de1d4a661fb545fabfda

And here is a dump of the server variables:

Variable_name   Value
auto_increment_increment    1
auto_increment_offset   1
autocommit  ON
automatic_sp_privileges ON
back_log    200
basedir /usr
big_tables  OFF
bind_address    *
binlog_cache_size   32768
binlog_checksum CRC32
binlog_direct_non_transactional_updates OFF
binlog_format   ROW
binlog_max_flush_queue_time 0
binlog_order_commits    ON
binlog_row_image    FULL
binlog_rows_query_log_events    OFF
binlog_stmt_cache_size  32768
block_encryption_mode   aes-128-ecb
bulk_insert_buffer_size 8388608
character_set_client    utf8
character_set_connection    utf8
character_set_database  latin1
character_set_filesystem    binary
character_set_results   utf8
character_set_server    latin1
character_set_system    utf8
character_sets_dir  /usr/share/mysql/charsets/
collation_connection    utf8_general_ci
collation_database  latin1_swedish_ci
collation_server    latin1_swedish_ci
completion_type NO_CHAIN
concurrent_insert   AUTO
connect_timeout 10
core_file   OFF
datadir /var/lib/mysql/
date_format %Y-%m-%d
datetime_format %Y-%m-%d %H:%i:%s
default_storage_engine  InnoDB
default_tmp_storage_engine  InnoDB
default_week_format 0
delay_key_write ON
delayed_insert_limit    100
delayed_insert_timeout  300
delayed_queue_size  1000
disconnect_on_expired_password  ON
div_precision_increment 4
end_markers_in_json OFF
enforce_gtid_consistency    ON
eq_range_index_dive_limit   10
error_count 0
event_scheduler OFF
expire_logs_days    0
explicit_defaults_for_timestamp OFF
external_user   
flush   OFF
flush_time  0
foreign_key_checks  ON
ft_boolean_syntax   + -><()~*:""&|
ft_max_word_len 84
ft_min_word_len 4
ft_query_expansion_limit    20
ft_stopword_file    (built-in)
general_log OFF
general_log_file    /var/lib/mysql/xxxx.log
group_concat_max_len    1024
gtid_executed   
gtid_mode   ON
gtid_next   AUTOMATIC
gtid_owned  
gtid_purged 
have_compress   YES
have_crypt  YES
have_dynamic_loading    YES
have_geometry   YES
have_openssl    DISABLED
have_profiling  YES
have_query_cache    YES
have_rtree_keys YES
have_ssl    DISABLED
have_symlink    DISABLED
host_cache_size 640
hostname    xxxx
identity    0
ignore_builtin_innodb   OFF
ignore_db_dirs  
init_connect    
init_file   
init_slave  
innodb_adaptive_flushing    ON
innodb_adaptive_flushing_lwm    10
innodb_adaptive_hash_index  ON
innodb_adaptive_max_sleep_delay 150000
innodb_additional_mem_pool_size 8388608
innodb_api_bk_commit_interval   5
innodb_api_disable_rowlock  OFF
innodb_api_enable_binlog    OFF
innodb_api_enable_mdl   OFF
innodb_api_trx_level    0
innodb_autoextend_increment 64
innodb_autoinc_lock_mode    1
innodb_buffer_pool_dump_at_shutdown OFF
innodb_buffer_pool_dump_now OFF
innodb_buffer_pool_filename ib_buffer_pool
innodb_buffer_pool_instances    8
innodb_buffer_pool_load_abort   OFF
innodb_buffer_pool_load_at_startup  OFF
innodb_buffer_pool_load_now OFF
innodb_buffer_pool_size 42949672960
innodb_change_buffer_max_size   25
innodb_change_buffering all
innodb_checksum_algorithm   innodb
innodb_checksums    ON
innodb_cmp_per_index_enabled    OFF
innodb_commit_concurrency   0
innodb_compression_failure_threshold_pct    5
innodb_compression_level    6
innodb_compression_pad_pct_max  50
innodb_concurrency_tickets  5000
innodb_data_file_path   ibdata1:12M:autoextend
innodb_data_home_dir    
innodb_disable_sort_file_cache  OFF
innodb_doublewrite  OFF
innodb_fast_shutdown    1
innodb_file_format  Antelope
innodb_file_format_check    ON
innodb_file_format_max  Antelope
innodb_file_per_table   ON
innodb_flush_log_at_timeout 1
innodb_flush_log_at_trx_commit  2
innodb_flush_method O_DIRECT
innodb_flush_neighbors  0
innodb_flushing_avg_loops   30
innodb_force_load_corrupted OFF
innodb_force_recovery   0
innodb_ft_aux_table 
innodb_ft_cache_size    8000000
innodb_ft_enable_diag_print OFF
innodb_ft_enable_stopword   ON
innodb_ft_max_token_size    84
innodb_ft_min_token_size    3
innodb_ft_num_word_optimize 2000
innodb_ft_result_cache_limit    2000000000
innodb_ft_server_stopword_table 
innodb_ft_sort_pll_degree   2
innodb_ft_total_cache_size  640000000
innodb_ft_user_stopword_table   
innodb_io_capacity  200
innodb_io_capacity_max  2000
innodb_large_prefix OFF
innodb_lock_wait_timeout    50
innodb_locks_unsafe_for_binlog  OFF
innodb_log_buffer_size  8388608
innodb_log_compressed_pages ON
innodb_log_file_size    104857600
innodb_log_files_in_group   2
innodb_log_group_home_dir   ./
innodb_lru_scan_depth   1024
innodb_max_dirty_pages_pct  75
innodb_max_dirty_pages_pct_lwm  0
innodb_max_purge_lag    0
innodb_max_purge_lag_delay  0
innodb_mirrored_log_groups  1
innodb_monitor_disable  
innodb_monitor_enable   
innodb_monitor_reset    
innodb_monitor_reset_all    
innodb_old_blocks_pct   37
innodb_old_blocks_time  1000
innodb_online_alter_log_max_size    134217728
innodb_open_files   2000
innodb_optimize_fulltext_only   OFF
innodb_page_size    16384
innodb_print_all_deadlocks  OFF
innodb_purge_batch_size 300
innodb_purge_threads    1
innodb_random_read_ahead    OFF
innodb_read_ahead_threshold 56
innodb_read_io_threads  16
innodb_read_only    OFF
innodb_replication_delay    0
innodb_rollback_on_timeout  OFF
innodb_rollback_segments    128
innodb_sort_buffer_size 1048576
innodb_spin_wait_delay  6
innodb_stats_auto_recalc    ON
innodb_stats_method nulls_equal
innodb_stats_on_metadata    OFF
innodb_stats_persistent ON
innodb_stats_persistent_sample_pages    20
innodb_stats_sample_pages   8
innodb_stats_transient_sample_pages 8
innodb_status_output    OFF
innodb_status_output_locks  OFF
innodb_strict_mode  OFF
innodb_support_xa   ON
innodb_sync_array_size  1
innodb_sync_spin_loops  30
innodb_table_locks  ON
innodb_thread_concurrency   0
innodb_thread_sleep_delay   10000
innodb_undo_directory   .
innodb_undo_logs    128
innodb_undo_tablespaces 0
innodb_use_native_aio   ON
innodb_use_sys_malloc   ON
innodb_version  5.6.19
innodb_write_io_threads 16
insert_id   0
interactive_timeout 28800
join_buffer_size    262144
keep_files_on_create    OFF
key_buffer_size 8388608
key_cache_age_threshold 300
key_cache_block_size    1024
key_cache_division_limit    100
large_files_support ON
large_page_size 0
large_pages OFF
last_insert_id  0
lc_messages en_US
lc_messages_dir /usr/share/mysql/
lc_time_names   en_US
license GPL
local_infile    ON
lock_wait_timeout   31536000
locked_in_memory    OFF
log_bin ON
log_bin_basename    /var/lib/mysql/xxxx-db1-bin
log_bin_index   /var/lib/mysql/xxxx-db1-bin.index
log_bin_trust_function_creators OFF
log_bin_use_v1_row_events   OFF
log_error   /var/log/mysqld.log
log_output  FILE
log_queries_not_using_indexes   OFF
log_slave_updates   ON
log_slow_admin_statements   OFF
log_slow_slave_statements   OFF
log_throttle_queries_not_using_indexes  0
log_warnings    1
long_query_time 10.000000
low_priority_updates    OFF
lower_case_file_system  OFF
lower_case_table_names  0
master_info_repository  TABLE
master_verify_checksum  OFF
max_allowed_packet  4194304
max_binlog_cache_size   18446744073709547520
max_binlog_size 1073741824
max_binlog_stmt_cache_size  18446744073709547520
max_connect_errors  100
max_connections 750
max_delayed_threads 20
max_error_count 64
max_heap_table_size 16777216
max_insert_delayed_threads  20
max_join_size   18446744073709551615
max_length_for_sort_data    1024
max_prepared_stmt_count 16382
max_relay_log_size  0
max_seeks_for_key   18446744073709551615
max_sort_length 1024a
max_sp_recursion_depth  0
max_tmp_tables  32
max_user_connections    0
max_write_lock_count    18446744073709551615
metadata_locks_cache_size   1024
metadata_locks_hash_instances   8
min_examined_row_limit  0
multi_range_count   256
myisam_data_pointer_size    6
myisam_max_sort_file_size   9223372036853727232
myisam_mmap_size    18446744073709551615
myisam_recover_options  OFF
myisam_repair_threads   1
myisam_sort_buffer_size 8388608
myisam_stats_method nulls_unequal
myisam_use_mmap OFF
net_buffer_length   16384
net_read_timeout    30
net_retry_count 10
net_write_timeout   60
new OFF
old OFF
old_alter_table OFF
old_passwords   0
open_files_limit    5000
optimizer_prune_level   1
optimizer_search_depth  62
optimizer_switch    index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on
optimizer_trace enabled=off,one_line=off
optimizer_trace_features    greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on
optimizer_trace_limit   1
optimizer_trace_max_mem_size    16384
optimizer_trace_offset  -1
performance_schema  ON
performance_schema_accounts_size    100
performance_schema_digests_size 10000
performance_schema_events_stages_history_long_size  10000
performance_schema_events_stages_history_size   10
performance_schema_events_statements_history_long_size  10000
performance_schema_events_statements_history_size   10
performance_schema_events_waits_history_long_size   10000
performance_schema_events_waits_history_size    10
performance_schema_hosts_size   100
performance_schema_max_cond_classes 80
performance_schema_max_cond_instances   5900
performance_schema_max_file_classes 50
performance_schema_max_file_handles 32768
performance_schema_max_file_instances   7693
performance_schema_max_mutex_classes    200
performance_schema_max_mutex_instances  19500
performance_schema_max_rwlock_classes   40
performance_schema_max_rwlock_instances 10300
performance_schema_max_socket_classes   10
performance_schema_max_socket_instances 1520
performance_schema_max_stage_classes    150
performance_schema_max_statement_classes    168
performance_schema_max_table_handles    4000
performance_schema_max_table_instances  12500
performance_schema_max_thread_classes   50
performance_schema_max_thread_instances 1600
performance_schema_session_connect_attrs_size   512
performance_schema_setup_actors_size    100
performance_schema_setup_objects_size   100
performance_schema_users_size   100
pid_file    /var/run/mysqld/mysqld.pid
plugin_dir  /usr/lib64/mysql/plugin/
port    3306
preload_buffer_size 32768
profiling   OFF
profiling_history_size  15
protocol_version    10
proxy_user  
pseudo_slave_mode   OFF
pseudo_thread_id    2018
query_alloc_block_size  8192
query_cache_limit   1048576
query_cache_min_res_unit    4096
query_cache_size    1048576000
query_cache_type    ON
query_cache_wlock_invalidate    OFF
query_prealloc_size 8192
rand_seed1  0
rand_seed2  0
range_alloc_block_size  4096
read_buffer_size    131072
read_only   OFF
read_rnd_buffer_size    262144
relay_log   
relay_log_basename  
relay_log_index 
relay_log_info_file relay-log.info
relay_log_info_repository   TABLE
relay_log_purge ON
relay_log_recovery  OFF
relay_log_space_limit   0
report_host 10.129.156.251
report_password 
report_port 3306
report_user 
rpl_stop_slave_timeout  31536000
secure_auth ON
secure_file_priv    
server_id   1
server_id_bits  32
server_uuid 96bcdb70-44c5-11e5-8df9-0401654ee301
skip_external_locking   ON
skip_name_resolve   OFF
skip_networking OFF
skip_show_database  OFF
slave_allow_batching    OFF
slave_checkpoint_group  512
slave_checkpoint_period 300
slave_compressed_protocol   OFF
slave_exec_mode STRICT
slave_load_tmpdir   /tmp
slave_max_allowed_packet    1073741824
slave_net_timeout   3600
slave_parallel_workers  0
slave_pending_jobs_size_max 16777216
slave_rows_search_algorithms    TABLE_SCAN,INDEX_SCAN
slave_skip_errors   ALL
slave_sql_verify_checksum   ON
slave_transaction_retries   10
slave_type_conversions  
slow_launch_time    2
slow_query_log  OFF
slow_query_log_file /var/lib/mysql/xxxx-slow.log
socket  /var/lib/mysql/mysql.sock
sort_buffer_size    262144
sql_auto_is_null    OFF
sql_big_selects ON
sql_buffer_result   OFF
sql_log_bin ON
sql_log_off OFF
sql_mode    STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION
sql_notes   ON
sql_quote_show_create   ON
sql_safe_updates    OFF
sql_select_limit    18446744073709551615
sql_slave_skip_counter  0
sql_warnings    OFF
ssl_ca  
ssl_capath  
ssl_cert    
ssl_cipher  
ssl_crl 
ssl_crlpath 
ssl_key 
storage_engine  InnoDB
stored_program_cache    256
sync_binlog 0
sync_frm    ON
sync_master_info    1
sync_relay_log  10000
sync_relay_log_info 10000
system_time_zone    EST
table_definition_cache  1400
table_open_cache    2000
table_open_cache_instances  1
thread_cache_size   15
thread_concurrency  10
thread_handling one-thread-per-connection
thread_stack    262144
time_format %H:%i:%s
time_zone   SYSTEM
timed_mutexes   OFF
timestamp   1457343545.452900
tmp_table_size  16777216
tmpdir  /tmp
transaction_alloc_block_size    8192
transaction_allow_batching  OFF
transaction_prealloc_size   4096
tx_isolation    REPEATABLE-READ
tx_read_only    OFF
unique_checks   ON
updatable_views_with_limit  YES
version 5.6.19-log
version_comment MySQL Community Server (GPL)
version_compile_machine x86_64
version_compile_os  Linux
wait_timeout    28800
warning_count   0

If there is any more relevant info you need, please let me know and I will update the question accordingly. I unfortunately have no clue anymore what I am looking for.

Best Answer

This is probably it:

query_cache_size    1048576000
query_cache_type    ON

What is happening: a write comes in, the entire GB of QC needs to be scanned to find all instances of that table to purge them. That takes a lot of CPU time. Meanwhile, all SELECTs are blocked.

Do not set the size bigger than about 50M, regardless of how much RAM you have. It would probably be wise to also use DYNAMIC instead of ON, and hand-pick which SELECTs to have SQL_CACHE and which to have SQL_NO_CACHE.

Or it may be that the QC is not worth having on at all. This is the common case for Production systems that have constant write traffic.

More...

Based on VARIABLES and GLOBAL STATUS

Observations:

Version: 5.6.19-log
50 GB of RAM
Uptime = 11d 07:07:58
You are not running on Windows.
Running 64-bit version
It appears that you are running both MyISAM and InnoDB.

The More Important Issues

Either convert completely to InnoDB or tweak the cache sizes. SUggest

innodb_buffer_pool_size = 30G -- you are not using all of the 40G now
key_buffer_size = 2G -- your current 8M is not efficient for writes

query_cache_size is really bad at 1000M. Your usage is moderately effective, so consider:

  • Add SQL_CACHE or SQL_NO_CACHE to all `SELECTs, based on which ones are likely to benefit,
  • Decrease query_cach_size to only 100m

A lot of queries are using tmp tables and, worse, disk tmp tables. Using the slowlog, find out which queries are the most invasive; let's work on them.

Raise tmp_table_size and max_heap_table_size from 16M to 32M (but no more). Since there are two ways that tmp tables can turn into 'disk tmp tables', this might prevent some conversions.

slave_skip_errors = ALL -- Sweeping problems under the rug. Big time!

Details and other observations

( Innodb_buffer_pool_pages_free * 16384 / innodb_buffer_pool_size ) = 1,864,255 * 16384 / 42949672960 = 71.1% -- buffer pool free -- buffer_pool_size is bigger than working set; could decrease it

( Innodb_log_writes ) = 40,597,718 / 976078 = 42 /sec

( Com_rollback ) = 99,316,489 / 976078 = 101 /sec -- ROLLBACKs in InnoDB. -- An excessive frequency of rollbacks may indicate inefficient app logic.

( local_infile ) = ON -- local_infile = ON is a potential security issue

( Key_writes / Key_write_requests ) = 2,200,386 / 4113735 = 53.5% -- key_buffer effectiveness for writes -- If you have enough RAM, it would be worthwhile to increase key_buffer_size.

( query_cache_size ) = 1000M -- Size of QC -- Too small = not of much use. Too large = too much overhead. Recommend either 0 or no more than 50M.

( Qcache_not_cached ) = 235,136,200 / 976078 = 240 /sec -- SQL_CACHE attempted, but ignored -- Rethink caching; tune qcache

( Qcache_inserts - Qcache_queries_in_cache ) = (258097280 - 7736) / 976078 = 264 /sec -- Invalidations/sec.

( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (1000M - 11519952) / 7736 / 8192 = 16.4 -- query_alloc_block_size vs formula -- Adjust query_alloc_block_size

( Created_tmp_tables ) = 31,198,441 / 976078 = 32 /sec -- Frequency of creating "temp" tables as part of complex SELECTs.

( Created_tmp_disk_tables ) = 5,996,371 / 976078 = 6.1 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size and max_heap_table_size. Check the rules for temp tables being able to use MEMORY instead of MyISAM. It may be possible to make a minor schema or query change to avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( Handler_read_rnd_next ) = 1,066,165,445,800 / 976078 = 1092295 /sec -- High if lots of table scans -- possibly inadequate keys

( Com_rollback / Com_commit ) = 99,316,489 / 49548802 = 200.4% -- Rollback : Commit ratio -- Rollbacks are costly; change app logic

( Select_scan ) = 16,213,392 / 976078 = 17 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (57757683 + 26581027 + 0 + 0 + 34482709 + 0) / 976078 = 121 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( expire_logs_days ) = 0 -- How soon to automatically purge binlog (after this many days) -- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash. (Not relevant if log_bin = OFF)

( slow_query_log ) = OFF -- Whether to log slow queries. (5.1.12)

( long_query_time ) = 10.000000 = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Aborted_clients / Connections ) = 33,444 / 45497 = 73.5% -- Threads bumped due to timeout -- Increase wait_timeout; be nice, use disconnect

( Threads_created / Connections ) = 3,675 / 45497 = 8.1% -- Rapidity of process creation -- Increase thread_cache_size

innodb_log_file_size is small (but hard to change).

Good caching in buffer_pool.

Good caching of table_definitions.

Com_delete = 27/sec

Any swapping?

GTID -- is this Master?