MySQL Killed by OOM after Corrupted Database

corruptionMySQLmysql-5.5

One of our web-app servers recently had an issue where MySQL was taking more memory than we have physical RAM. This server has 64GB of memory and 8GB in swap, and MySQL was OOM-ed because it was using over 69GB of memory.

After the fact I found logs indicating that one of our database tables had become corrupted several hours before the issue occurred, and there are literally millions of log entries in the three hours before the OOM complaining about this table being corrupt.

I want to understand why this happened so that I can prevent it. Here are the relevant portions of my MySQL configuration:

max_connections = 1000
table_open_cache = 1000
query_cache_size = 1G

innodb_buffer_pool_size = 16GB
innodb_buffer_pool_instances = 4
innodb_log_file_size = 1G
innodb_log_buffer_size = 24M

I've also read these similar questions:

I don't believe the answers to those questions apply (I checked the variables they mention), and I don't see how MySQL could reasonably use ~49GB of memory for 1000 connections. However, there were a lot of connections open at the time of the crash, and I noticed afterward that an insert ... on duplicate key update ... query against the corrupted table (which was MyISAM) was hanging indefinitely.

Does anyone know why MySQL would have consumed so much memory so suddenly (MySQL usually uses ~24GB of memory on this server)? Was the corrupted table related, or was this caused by a misconfiguration?

Note: I have completed a memory test on this server without errors.


Show variables:

+---------------------------------------------------+---
| Variable_name                                     | Value
+---------------------------------------------------+---
| auto_increment_increment                          | 1
| auto_increment_offset                             | 1
| autocommit                                        | ON
| automatic_sp_privileges                           | ON
| back_log                                          | 50
| big_tables                                        | OFF
| binlog_cache_size                                 | 32768
| binlog_direct_non_transactional_updates           | OFF
| binlog_format                                     | MIXED
| binlog_stmt_cache_size                            | 32768
| bulk_insert_buffer_size                           | 8388608
| completion_type                                   | NO_CHAIN
| concurrent_insert                                 | AUTO
| connect_timeout                                   | 10
| default_storage_engine                            | InnoDB
| delay_key_write                                   | ON
| delayed_insert_limit                              | 100
| delayed_insert_timeout                            | 300
| delayed_queue_size                                | 1000
| div_precision_increment                           | 4
| engine_condition_pushdown                         | ON
| error_count                                       | 0
| event_scheduler                                   | OFF
| foreign_key_checks                                | ON
| group_concat_max_len                              | 1024
| ignore_builtin_innodb                             | OFF
| innodb_adaptive_flushing                          | ON
| innodb_adaptive_hash_index                        | ON
| innodb_additional_mem_pool_size                   | 8388608
| innodb_autoextend_increment                       | 8
| innodb_autoinc_lock_mode                          | 1
| innodb_buffer_pool_instances                      | 4
| innodb_buffer_pool_size                           | 17179869184
| innodb_change_buffering                           | all
| innodb_checksums                                  | ON
| innodb_commit_concurrency                         | 0
| innodb_concurrency_tickets                        | 500
| innodb_data_file_path                             | ibdata1:10M:autoextend
| innodb_doublewrite                                | ON
| innodb_fast_shutdown                              | 1
| innodb_file_format                                | Antelope
| innodb_file_format_check                          | ON
| innodb_file_format_max                            | Antelope
| innodb_file_per_table                             | ON
| innodb_flush_log_at_trx_commit                    | 0
| innodb_flush_method                               | O_DIRECT
| innodb_force_load_corrupted                       | OFF
| innodb_force_recovery                             | 0
| innodb_io_capacity                                | 200
| innodb_large_prefix                               | OFF
| innodb_lock_wait_timeout                          | 50
| innodb_locks_unsafe_for_binlog                    | OFF
| innodb_log_buffer_size                            | 25165824
| innodb_log_file_size                              | 1073741824
| innodb_log_files_in_group                         | 2
| innodb_max_dirty_pages_pct                        | 75
| innodb_max_purge_lag                              | 0
| innodb_mirrored_log_groups                        | 1
| innodb_old_blocks_pct                             | 37
| innodb_old_blocks_time                            | 0
| innodb_open_files                                 | 300
| innodb_print_all_deadlocks                        | OFF
| innodb_purge_batch_size                           | 20
| innodb_purge_threads                              | 0
| innodb_random_read_ahead                          | OFF
| innodb_read_ahead_threshold                       | 56
| innodb_read_io_threads                            | 4
| innodb_rollback_on_timeout                        | OFF
| innodb_rollback_segments                          | 128
| innodb_spin_wait_delay                            | 6
| innodb_stats_method                               | nulls_equal
| innodb_stats_on_metadata                          | ON
| innodb_stats_sample_pages                         | 8
| innodb_strict_mode                                | OFF
| innodb_support_xa                                 | ON
| innodb_sync_spin_loops                            | 30
| innodb_table_locks                                | ON
| innodb_thread_concurrency                         | 0
| innodb_thread_sleep_delay                         | 10000
| innodb_use_native_aio                             | ON
| innodb_use_sys_malloc                             | ON
| innodb_version                                    | 5.5.44
| innodb_write_io_threads                           | 4
| interactive_timeout                               | 28800
| join_buffer_size                                  | 131072
| keep_files_on_create                              | OFF
| key_buffer_size                                   | 67108864
| key_cache_age_threshold                           | 300
| key_cache_block_size                              | 1024
| key_cache_division_limit                          | 100
| large_files_support                               | ON
| large_page_size                                   | 0
| large_pages                                       | OFF
| local_infile                                      | ON
| lock_wait_timeout                                 | 31536000
| locked_in_memory                                  | OFF
| low_priority_updates                              | OFF
| max_allowed_packet                                | 134217728
| max_binlog_size                                   | 104857600
| max_connect_errors                                | 10
| max_connections                                   | 1000
| max_delayed_threads                               | 20
| max_error_count                                   | 64
| max_heap_table_size                               | 16777216
| max_insert_delayed_threads                        | 20
| max_length_for_sort_data                          | 1024
| max_long_data_size                                | 134217728
| max_prepared_stmt_count                           | 16382
| max_relay_log_size                                | 0
| max_sort_length                                   | 1024
| max_sp_recursion_depth                            | 0
| max_tmp_tables                                    | 32
| max_user_connections                              | 0
| metadata_locks_cache_size                         | 1024
| min_examined_row_limit                            | 0
| multi_range_count                                 | 256
| myisam_data_pointer_size                          | 6
| myisam_recover_options                            | BACKUP
| myisam_repair_threads                             | 1
| myisam_sort_buffer_size                           | 8388608
| myisam_stats_method                               | nulls_unequal
| myisam_use_mmap                                   | OFF
| net_buffer_length                                 | 16384
| net_read_timeout                                  | 30
| net_retry_count                                   | 10
| net_write_timeout                                 | 60
| open_files_limit                                  | 65500
| optimizer_prune_level                             | 1
| optimizer_search_depth                            | 62
| preload_buffer_size                               | 32768
| protocol_version                                  | 10
| query_alloc_block_size                            | 8192
| query_cache_limit                                 | 1048576
| query_cache_min_res_unit                          | 4096
| query_cache_size                                  | 1073741824
| query_cache_type                                  | ON
| query_cache_wlock_invalidate                      | OFF
| query_prealloc_size                               | 8192
| range_alloc_block_size                            | 4096
| read_buffer_size                                  | 131072
| read_only                                         | OFF
| read_rnd_buffer_size                              | 262144
| rpl_recovery_rank                                 | 0
| skip_external_locking                             | ON
| sort_buffer_size                                  | 2097152
| sql_big_selects                                   | ON
| sql_big_tables                                    | OFF
| sql_buffer_result                                 | OFF
| storage_engine                                    | InnoDB
| stored_program_cache                              | 256
| sync_binlog                                       | 0
| sync_frm                                          | ON
| sync_master_info                                  | 0
| sync_relay_log                                    | 0
| sync_relay_log_info                               | 0
| table_definition_cache                            | 400
| table_open_cache                                  | 1000
| thread_cache_size                                 | 8
| thread_concurrency                                | 10
| thread_handling                                   | one-thread-per-connection
| thread_stack                                      | 196608
| time_zone                                         | SYSTEM
| timed_mutexes                                     | OFF
| tmp_table_size                                    | 16777216
| transaction_alloc_block_size                      | 8192
| transaction_prealloc_size                         | 4096
| tx_isolation                                      | REPEATABLE-READ
| unique_checks                                     | ON
| updatable_views_with_limit                        | YES
| wait_timeout                                      | 28800
| warning_count                                     | 0
+---------------------------------------------------+--

Show Global Status:

+------------------------------------------+---------------+
| Variable_name                            | Value         |
+------------------------------------------+---------------+
| Aborted_clients                          | 9             |
| Aborted_connects                         | 5201          |
| Binlog_cache_disk_use                    | 566           |
| Binlog_cache_use                         | 59455472      |
| Binlog_stmt_cache_disk_use               | 170           |
| Binlog_stmt_cache_use                    | 7980756       |
| Bytes_received                           | 118197326576  |
| Bytes_sent                               | 936083868833  |
| Connections                              | 15785450      |
| Created_tmp_disk_tables                  | 37787519      |
| Created_tmp_files                        | 1385          |
| Created_tmp_tables                       | 70916100      |
| Delayed_errors                           | 0             |
| Delayed_insert_threads                   | 0             |
| Delayed_writes                           | 0             |
| Flush_commands                           | 78            |
| Handler_commit                           | 306649532     |
| Handler_delete                           | 232632        |
| Handler_discover                         | 0             |
| Handler_prepare                          | 93231428      |
| Handler_read_first                       | 60815625      |
| Handler_read_key                         | 885609313     |
| Handler_read_last                        | 838614        |
| Handler_read_next                        | 4197678356    |
| Handler_read_prev                        | 357472833     |
| Handler_read_rnd                         | 101193542     |
| Handler_read_rnd_next                    | 52819751104   |
| Handler_rollback                         | 28462         |
| Handler_savepoint                        | 0             |
| Handler_savepoint_rollback               | 0             |
| Handler_update                           | 64446304      |
| Handler_write                            | 412627899     |
| Innodb_buffer_pool_pages_data            | 1048088       |
| Innodb_buffer_pool_bytes_data            | 17171873792   |
| Innodb_buffer_pool_pages_dirty           | 431           |
| Innodb_buffer_pool_bytes_dirty           | 7061504       |
| Innodb_buffer_pool_pages_flushed         | 43659545      |
| Innodb_buffer_pool_pages_free            | 2             |
| Innodb_buffer_pool_pages_misc            | 482           |
| Innodb_buffer_pool_pages_total           | 1048572       |
| Innodb_buffer_pool_read_ahead_rnd        | 0             |
| Innodb_buffer_pool_read_ahead            | 848473        |
| Innodb_buffer_pool_read_ahead_evicted    | 358935        |
| Innodb_buffer_pool_read_requests         | 23851889088   |
| Innodb_buffer_pool_reads                 | 132872602     |
| Innodb_buffer_pool_wait_free             | 0             |
| Innodb_buffer_pool_write_requests        | 454509933     |
| Innodb_data_fsyncs                       | 14157861      |
| Innodb_data_pending_fsyncs               | 0             |
| Innodb_data_pending_reads                | 0             |
| Innodb_data_pending_writes               | 0             |
| Innodb_data_read                         | 2191680999424 |
| Innodb_data_reads                        | 134331079     |
| Innodb_data_writes                       | 49646417      |
| Innodb_data_written                      | 1460362066432 |
| Innodb_dblwr_pages_written               | 43659545      |
| Innodb_dblwr_writes                      | 1967389       |
| Innodb_have_atomic_builtins              | ON            |
| Innodb_log_waits                         | 0             |
| Innodb_log_write_requests                | 55918265      |
| Innodb_log_writes                        | 2817833       |
| Innodb_os_log_fsyncs                     | 3218361       |
| Innodb_os_log_pending_fsyncs             | 0             |
| Innodb_os_log_pending_writes             | 0             |
| Innodb_os_log_written                    | 29521291776   |
| Innodb_page_size                         | 16384         |
| Innodb_pages_created                     | 1082758       |
| Innodb_pages_read                        | 133769705     |
| Innodb_pages_written                     | 43659545      |
| Innodb_row_lock_current_waits            | 0             |
| Innodb_row_lock_time                     | 441363        |
| Innodb_row_lock_time_avg                 | 1219          |
| Innodb_row_lock_time_max                 | 51820         |
| Innodb_row_lock_waits                    | 362           |
| Innodb_rows_deleted                      | 210683        |
| Innodb_rows_inserted                     | 6673020       |
| Innodb_rows_read                         | 55723159186   |
| Innodb_rows_updated                      | 61534004      |
| Innodb_truncated_status_writes           | 0             |
| Key_blocks_not_flushed                   | 0             |
| Key_blocks_unused                        | 50984         |
| Key_blocks_used                          | 53585         |
| Key_read_requests                        | 1649235730    |
| Key_reads                                | 90119161      |
| Key_write_requests                       | 4078153       |
| Key_writes                               | 2287803       |
| Last_query_cost                          | 0.000000      |
| Max_used_connections                     | 219           |
| Not_flushed_delayed_rows                 | 0             |
| Open_files                               | 842           |
| Open_streams                             | 0             |
| Open_table_definitions                   | 921           |
| Open_tables                              | 1000          |
| Opened_files                             | 504435699     |
| Opened_table_definitions                 | 115393300     |
| Opened_tables                            | 112827119     |
| Prepared_stmt_count                      | 0             |
| Qcache_free_blocks                       | 58397         |
| Qcache_free_memory                       | 377606768     |
| Qcache_hits                              | 600978542     |
| Qcache_inserts                           | 160064226     |
| Qcache_lowmem_prunes                     | 17296550      |
| Qcache_not_cached                        | 66882759      |
| Qcache_queries_in_cache                  | 471854        |
| Qcache_total_blocks                      | 1013507       |
| Queries                                  | 1204594580    |
| Questions                                | 1201397855    |
| Rpl_status                               | AUTH_MASTER   |
| Select_full_join                         | 152194        |
| Select_full_range_join                   | 0             |
| Select_range                             | 9798991       |
| Select_range_check                       | 0             |
| Select_scan                              | 116607434     |
| Slave_heartbeat_period                   | 30.000        |
| Slave_open_temp_tables                   | 0             |
| Slave_received_heartbeats                | 2559          |
| Slave_retried_transactions               | 0             |
| Slave_running                            | OFF           |
| Slow_launch_threads                      | 0             |
| Slow_queries                             | 3255          |
| Sort_merge_passes                        | 3             |
| Sort_range                               | 14976751      |
| Sort_rows                                | 144820896     |
| Sort_scan                                | 11605905      |
| Table_locks_immediate                    | 380794869     |
| Table_locks_waited                       | 105           |
| Tc_log_max_pages_used                    | 0             |
| Tc_log_page_size                         | 0             |
| Tc_log_page_waits                        | 0             |
| Threads_cached                           | 7             |
| Threads_connected                        | 3             |
| Threads_created                          | 2873          |
| Threads_running                          | 3             |
| Uptime                                   | 4892974       |
| Uptime_since_flush_status                | 4892974       |
+------------------------------------------+---------------+

Best Answer

query_cache_size = 1G -- This is really bad. Don't set it bigger than about 50M. It won't directly cause a crash, but it can cause lengthy stalls, which could lead to a crash.

When any write occurs to a table, all entries in the QC for that table is purged. The bigger the QC the longer that takes.

The QC is "in front of" the engines (MyISAM and InnoDB). That is, first the QC is checked to see if the exact query is still there, in which case the resultset is returned. Otherwise, the engine takes over and does a different level of caching (key_buffer + OS space for MyISAM; buffer_pool for InnoDB).

max_connections = 1000 -- What "client" do you have? Can it really produce that many connections?

A likely causes tmp_table_size and max_heap_table_size should not be bigger than, say, 1% of RAM. Else you are at a big risk of running out of RAM when a lot of connections perform complex queries.

MyISAM's indexes become corrupted whenever there is an ungraceful shutdown. This is a strong reason to switch to InnoDB.

Did you raise key_buffer_size? It should not be bigger than about 20% of RAM.

Please provide SHOW VARIABLES; and SHOW GLOBAL STATUS;; I will check to see what else might be over-sized.