Mysql – CPU High usage crashing our server

cpucrashMySQLoptimization

The server where my database is running is suffering from CPU spikes. We're having trouble identifying what is causing these CPU spikes, and consequently how to mitigate them.

I've tried adding some indexes, but maybe I forgot one or two.

How do I check which table has any problems?

Once a day I have a huge CPU jump to 700%. We've been resolving it to date by restarting the server.

I can provide the necessary information to find the problem, but I don't know what info is needed.

This is the MySQLTuner report:

 >>  MySQLTuner 1.6.18 - Major Hayden <major@mhtx.net>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering

[[0;34m--[0m] Skipped version check for MySQLTuner script
[[0;32mOK[0m] Logged in using credentials from debian maintenance account.
[[0;32mOK[0m] Currently running supported MySQL version 10.1.47-MariaDB-0+deb9u1
[[0;32mOK[0m] Operating on 64-bit architecture
 
-------- Storage Engine Statistics -----------------------------------------------------------------
[[0;34m--[0m] Status: [0;32m+Aria [0m[0;32m+CSV [0m[0;32m+InnoDB [0m[0;32m+MEMORY [0m[0;32m+MRG_MyISAM [0m[0;32m+MyISAM [0m[0;32m+PERFORMANCE_SCHEMA [0m[0;32m+SEQUENCE [0m
[[0;34m--[0m] Data in InnoDB tables: 1G (Tables: 387)
[[0;34m--[0m] Data in MyISAM tables: 1K (Tables: 1)
[[0;32mOK[0m] Total fragmented tables: 0
 
-------- Security Recommendations ------------------------------------------------------------------
[[0;32mOK[0m] There are no anonymous accounts for any database users
[[0;32mOK[0m] All database users have passwords assigned
[[0;31m!![0m] User 'kacper@%' hasn't specific host restriction.
[[0;34m--[0m] There are 612 basic passwords in the list.
 
-------- CVE Security Recommendations --------------------------------------------------------------
[[0;32mOK[0m] NO SECURITY CVE FOUND FOR YOUR VERSION
 
-------- Performance Metrics -----------------------------------------------------------------------
[[0;34m--[0m] Up for: 2h 17m 38s (1M q [236.530 qps], 53K conn, TX: 1G, RX: 285M)
[[0;34m--[0m] Reads / Writes: 71% / 29%
[[0;34m--[0m] Binary logging is disabled
[[0;34m--[0m] Physical Memory     : 62.8G
[[0;34m--[0m] Max MySQL memory    : 12.6G
[[0;34m--[0m] Other process memory: 209.7M
[[0;34m--[0m] Total buffers: 328.0M global + 2.8M per thread (4096 max threads)
[[0;34m--[0m] P_S Max memory usage: 1G
[[0;34m--[0m] Galera GCache Max memory usage: 0B
[[0;32mOK[0m] Maximum reached memory usage: 7.1G (11.38% of installed RAM)
[[0;32mOK[0m] Maximum possible memory usage: 12.6G (20.10% of installed RAM)
[[0;32mOK[0m] Overall possible memory usage with other process is compatible with memory available
[[0;32mOK[0m] Slow queries: 0% (0/1M)
[[0;32mOK[0m] Highest usage of available connections: 51% (2102/4096)
[[0;32mOK[0m] Aborted connections: 0.02%  (13/53000)
[[0;31m!![0m] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
[[0;32mOK[0m] Query cache efficiency: 30.4% (505K cached / 1M selects)
[[0;31m!![0m] Query cache prunes per day: 511609
[[0;32mOK[0m] Sorts requiring temporary tables: 5% (470 temp sorts / 9K sorts)
[[0;31m!![0m] Joins performed without indexes: 48
[[0;31m!![0m] Temporary tables created on disk: 76% (39K on disk / 51K total)
[[0;32mOK[0m] Thread cache hit rate: 89% (5K created / 53K connections)
[[0;32mOK[0m] Table cache hit rate: 81% (697 open / 860 opened)
[[0;32mOK[0m] Open file limit used: 0% (61/16K)
[[0;32mOK[0m] Table locks acquired immediately: 100% (998K immediate / 998K locks)
 
-------- Performance schema ------------------------------------------------------------------------
[[0;34m--[0m] Performance schema is enabled.
[[0;34m--[0m] Memory used by P_S: 1.1G
[[0;34m--[0m] Sys schema isn't installed.
 
-------- ThreadPool Metrics ------------------------------------------------------------------------
[[0;34m--[0m] ThreadPool stat is enabled.
[[0;34m--[0m] Thread Pool Size: 8 thread(s).
[[0;34m--[0m] Using default value is good enough for your version (10.1.47-MariaDB-0+deb9u1)
 
-------- MyISAM Metrics ----------------------------------------------------------------------------
[[0;31m!![0m] Key buffer used: 18.3% (3M used / 16M cache)
[[0;32mOK[0m] Key buffer size / total MyISAM indexes: 16.0M/124.0K
[[0;32mOK[0m] Read Key buffer hit rate: 97.5% (162 cached / 4 reads)
 
-------- AriaDB Metrics ----------------------------------------------------------------------------
[[0;34m--[0m] AriaDB is enabled.
[[0;32mOK[0m] Aria pagecache size / total Aria indexes: 128.0M/1B
[[0;31m!![0m] Aria pagecache hit rate: 83.9% (241K cached / 38K reads)
 
-------- InnoDB Metrics ----------------------------------------------------------------------------
[[0;34m--[0m] InnoDB is enabled.
[[0;31m!![0m] InnoDB buffer pool / data size: 128.0M/1.9G
[[0;31m!![0m] InnoDB buffer pool <= 1G and innodb_buffer_pool_instances(!=1).
[[0;34m--[0m] InnoDB Buffer Pool Chunk Size not used or defined in your version
[[0;32mOK[0m] InnoDB Read buffer efficiency: 99.87% (15073372350 hits/ 15093280124 total)
[[0;31m!![0m] InnoDB Write Log efficiency: 15.47% (49147 hits/ 317764 total)
[[0;32mOK[0m] InnoDB log waits: 0.00% (0 waits / 366911 writes)
 
-------- TokuDB Metrics ----------------------------------------------------------------------------
[[0;34m--[0m] TokuDB is disabled.
 
-------- Galera Metrics ----------------------------------------------------------------------------
[[0;34m--[0m] Galera is disabled.
 
-------- Replication Metrics -----------------------------------------------------------------------
[[0;34m--[0m] Galera Synchronous replication: NO
[[0;34m--[0m] No replication slave(s) for this server.
[[0;34m--[0m] This is a standalone server.
 
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
    Restrict Host for user@% to user@SpecificDNSorIp
    MySQL started within last 24 hours - recommendations may be inaccurate
    Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
    Adjust your join queries to always utilize indexes
    When making adjustments, make tmp_table_size/max_heap_table_size equal
    Reduce your SELECT DISTINCT queries which have no LIMIT clause
Variables to adjust:
    query_cache_size (> 16M)
    join_buffer_size (> 256.0K, or always use indexes with joins)
    tmp_table_size (> 16M)
    max_heap_table_size (> 16M)
    innodb_buffer_pool_size (>= 1G) if possible.
    innodb_buffer_pool_instances (=1)

Best Answer

Analysis of GLOBAL STATUS and VARIABLES:

Observations:

  • Version: 10.1.47-MariaDB-0+deb9u1
  • 64 GB of RAM
  • Uptime = 02:27:19; Please rerun SHOW GLOBAL STATUS after several hours.
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Very important: Increase innodb_buffer_pool_size to at least 2G, but not more than 70% of available RAM (after allowing room for your app)

Very important: Increase the open file limit in the OS so that MariaDB can increase some of its settings.

If you have SSD, up innodb_io_capacity and innodb_io_capacity_max to 1000 and 2000.

innodb_log_file_size = 4G -- (There may be difficulty is changing this. If so, skip for now.)

Query cache -- Can't tell at the moment whether OFF would be better or changing some setting would b e better. Work on the other this first; come back to this.

There are some slow queries. Find them and let's work on speeding them up. See http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog

About 1/3 of the queries are "admin" type. What is going on?

If you capture the GLOBAL STATUS and VARIABLES again after fixing the above things, be sure to have it up at least 24 hours.

Details and other observations:

( (key_buffer_size / 0.20 + innodb_buffer_pool_size / 0.70) ) = ((16M / 0.20 + 128M / 0.70)) / 65536M = 0.40% -- Most of available ram should be made available for caching. -- http://mysql.rjweb.org/doc.php/memory

( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) = (4 + 0 + 40835633 + 1094901 + 53370 + 1094779) / 8839 = 4873 /sec -- IOPs? -- If the hardware can handle it, set innodb_io_capacity (now 200) to about this value.

( ( Key_reads + Key_writes + Innodb_pages_read + Innodb_pages_written + Innodb_dblwr_writes + Innodb_buffer_pool_pages_flushed ) / innodb_io_capacity / Uptime ) = ( 4 + 0 + 40835633 + 1094901 + 53370 + 1094779 ) / 200 / 8839 = 2436.9% -- This may be a metric indicating what innodb_io_capacity is set reasonably. -- Increase innodb_io_capacity (now 200) if the hardware can handle it.

( innodb_buffer_pool_size ) = 128M -- InnoDB Data + Index cache -- 128M (an old default) is woefully small.

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_pages_written / Innodb_buffer_pool_write_requests ) = 1,094,901 / 3914981 = 28.0% -- Write requests that had to hit disk -- Check innodb_buffer_pool_size (now 134217728)

( Innodb_buffer_pool_reads ) = 20,609,583 / 8839 = 2331 /sec -- Cache misses in the buffer_pool. -- Increase innodb_buffer_pool_size (now 134217728)? (~100 is limit for HDD, ~1000 is limit for SSDs.)

( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((20609583 + 1094779) ) / 8839 = 2455 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size (now 134217728)?

( Innodb_buffer_pool_pages_flushed ) = 1,094,779 / 8839 = 123 /sec -- Writes (flushes) -- Increase innodb_buffer_pool_size (now 134217728)?

( Innodb_buffer_pool_read_ahead_evicted ) = 47,759 / 8839 = 5.4 /sec

( Innodb_log_writes ) = 380,439 / 8839 = 43 /sec

( innodb_flush_method ) = innodb_flush_method = -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( default_tmp_storage_engine ) = default_tmp_storage_engine =

( Innodb_row_lock_waits ) = 4,671 / 8839 = 0.53 /sec -- How often there is a delay in getting a row lock. -- May be caused by complex queries that could be optimized.

( Innodb_row_lock_waits/Innodb_rows_inserted ) = 4,671/14717 = 31.7% -- Frequency of having to wait for a row.

( Innodb_dblwr_writes ) = 53,370 / 8839 = 6 /sec -- "Doublewrite buffer" writes to disk. "Doublewrites" are a reliability feature. Some newer versions / configurations don't need them. -- (Symptom of other issues)

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 40835633 + 1094901 ) / 8839 / 200 = 2371.9% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity (now 200) if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 4,096 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 4096) is too high and various memory settings are high, you could run out of RAM.

( innodb_buffer_pool_populate ) = OFF = 0 -- NUMA control

( (Com_show_create_table + Com_show_fields) / Questions ) = (82 + 38505) / 2038452 = 1.9% -- Naughty framework -- spending a lot of effort rediscovering the schema. -- Complain to the 3rd party vendor.

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( bulk_insert_buffer_size ) = 8M / 65536M = 0.01% -- Buffer for multi-row INSERTs and LOAD DATA -- Too big could threaten RAM size. Too small could hinder such operations.

( Qcache_hits / Qcache_inserts ) = 530,104 / 525802 = 1.01 -- Hit to insert ratio -- high is good -- Consider turning off the query cache.

( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (16M - 7255496) / 5280 / 16384 = 0.11 -- query_alloc_block_size vs formula -- Adjust query_alloc_block_size (now 16384)

( Created_tmp_disk_tables ) = 40,027 / 8839 = 4.5 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 16777216) and max_heap_table_size (now 16777216). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( Created_tmp_disk_tables / Created_tmp_tables ) = 40,027 / 52361 = 76.4% -- Percent of temp tables that spilled to disk -- Maybe increase tmp_table_size (now 16777216) and max_heap_table_size (now 16777216); improve indexes; avoid blobs, etc.

( Handler_read_rnd_next ) = 23,650,792,212 / 8839 = 2675731 /sec -- High if lots of table scans -- possibly inadequate keys

( Handler_read_rnd_next / Com_select ) = 23,650,792,212 / 1204725 = 19,631 -- Avg rows scanned per SELECT. (approx) -- Consider raising read_buffer_size (now 131072)

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (12162 + 457998 + 1321 + 675) / 3 = 157,385 -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( Select_scan ) = 183,655 / 8839 = 21 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Select_scan / Com_select ) = 183,655 / 1204725 = 15.2% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (12162 + 1321 + 0 + 675 + 457998 + 0) / 8839 = 53 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( ( Com_stmt_prepare - Com_stmt_close ) / ( Com_stmt_prepare + Com_stmt_close ) ) = ( 1657 - 0 ) / ( 1657 + 0 ) = 100.0% -- Are you closing your prepared statements? -- Add Closes.

( Com_stmt_close / Com_stmt_prepare ) = 0 / 1657 = 0 -- Prepared statements should be Closed. -- Check whether all Prepared statements are "Closed".

( Com_alter_table ) = 575 / 8839 = 0.065 /sec -- Why so many ALTERs?

( Com_admin_commands ) = 664,278 / 8839 = 75 /sec -- Why so many DDL statements?

( Com_admin_commands / Queries ) = 664,278 / 2040106 = 32.6% -- Percent of queries that are "admin" commands. -- What's going on?

( binlog_format ) = binlog_format = STATEMENT -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( innodb_autoinc_lock_mode ) = 1 -- Galera: desires 2 -- 2 = "interleaved"; 1 = "consecutive" is typical; 0 = "traditional". -- Galera desires 2; 2 requires BINLOG_FORMAT=ROW or MIXED

( slow_query_log ) = slow_query_log = OFF -- Whether to log slow queries. (5.1.12)

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( log_slow_slave_statements ) = log_slow_slave_statements = OFF -- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Replica reads.

( Max_used_connections ) = 2,102 -- High-water mark for connections -- Lots of inactive connections is OK; over 100 active connections is likely to be a problem. Max_used_connections (now 2102) does not distinguish them; Threads_running (now 1) is instantaneous.

( Max_used_connections / host_cache_size ) = 2,102 / 807 = 260.5% -- Increase host_cache_size (now 807)

( Connections ) = 57,093 / 8839 = 6.5 /sec -- Connections -- Increase wait_timeout (now 28800); use pooling?

Abnormally small:

Acl_users = 3
Innodb_mem_adaptive_hash = 2.25e+6
Innodb_mem_dictionary = 1.75e+6

Abnormally large:

( Innodb_pages_read + Innodb_pages_written ) / Uptime = 4,743
Aria_pagecache_reads = 4.5 /sec
Com_alter_table + Com_flush = 0.065 /sec
Com_create_db = 10 /HR
Com_create_index = 0.15 /sec
Com_create_table = 0.071 /sec
Com_replace_select = 77 /HR
Com_show_databases = 40 /HR
Com_show_fields = 4.4 /sec
Com_show_keys = 0.036 /sec
Com_show_master_status = 0.093 /sec
Com_show_processlist = 0.089 /sec
Com_show_slave_hosts = 1.2 /HR
Com_show_slave_status = 0.093 /sec
Com_show_storage_engines = 1.2 /HR
Com_show_tables = 0.61 /sec
Feature_locale = 0.12 /sec
Handler_discover = 21 /HR
Handler_read_rnd_next / Handler_read_rnd = 284,534
Innodb_buffer_pool_pages_made_not_young = 264922 /sec
Innodb_buffer_pool_pages_made_young = 1164 /sec
Innodb_buffer_pool_read_ahead = 2288 /sec
Innodb_buffer_pool_read_requests = 1744557 /sec
Innodb_buffer_pool_reads * innodb_page_size / innodb_buffer_pool_size = 251581.8%
Innodb_buffered_aio_submitted = 2288 /sec
Innodb_data_read = 75693018 /sec
Innodb_data_reads = 4619 /sec
Innodb_mutex_spin_waits = 2698 /sec
Innodb_num_index_pages_written = 48 /sec
Innodb_num_non_index_pages_written = 125 /sec
Innodb_pages_read = 4619 /sec
Innodb_pages_read + Innodb_pages_written = 4743 /sec
Innodb_rows_read = 2676665 /sec
Innodb_s_lock_spin_rounds = 6709 /sec
Innodb_s_lock_spin_waits = 1998 /sec
Prepared_stmt_count = 170
Rows_read = 2676658 /sec
Tc_log_page_size = 4,096
Threads_connected = 1,988
host_cache_size = 807
innodb_background_scrub_data_check_interval = 0.41 /sec
innodb_background_scrub_data_interval = 68 /sec
performance_schema_max_cond_instances = 17,684
performance_schema_max_socket_instances = 8,212
performance_schema_max_thread_instances = 8,292

Abnormal strings:

innodb_default_row_format = compact
innodb_fast_shutdown = 1
myisam_stats_method = NULLS_UNEQUAL