MySQL 5.6 Daemon Crashes Daily – Lock Error Solution

innodbMySQLmysql-5.6

For a while now our MySQL has been crashing daily, sometimes multiple times even. I've troubleshot a lot but nothing seems to help, it's also a live server so I have to be careful with tests/experiments. I even tried cloning the entire server to test more thoroughly, but I'm unable to reproduce the errors on there. So I'm fairly certain it requires at least a certain pressure on the server.

We're on CentOS 6.8 x86_64, running MySQL 5.6.29 (we're unable to upgrade either due to circumstances). It has 8 GB of RAM, usually about 4 GB is "cached" memory with a few 100 MBs that are actually unallocated. All databases run on the built-in version of InnoDB as far as I can tell.

Error logs

The DB and .ibd file in the first line are different every time so it's not that that particular DB/table is corrupt. If it were, the test server would've crashed too.

2017-02-10 02:39:47 24223 [ERROR] InnoDB: Unable to lock ./mydb/field_revision_field_tel_nr_.ibd, error: 37
2017-02-10 02:39:47 2ab04c081700  InnoDB: Assertion failure in thread 46936678209280 in file fil0fil.cc line 875
InnoDB: Failing assertion: ret
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
01:39:47 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=18
max_threads=300
thread_count=6
connection_count=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 151807 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0xf906670
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 2ab04c080e10 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x91f1d5]
/usr/sbin/mysqld(handle_fatal_signal+0x3d8)[0x678e78]
/lib64/libpthread.so.0(+0xf7e0)[0x2aaf7aff57e0]
/lib64/libc.so.6(gsignal+0x35)[0x2aaf7c21c5e5]
/lib64/libc.so.6(abort+0x175)[0x2aaf7c21ddc5]
/usr/sbin/mysqld[0xa8645b]
/usr/sbin/mysqld[0xa8670e]
/usr/sbin/mysqld[0xa8d119]
/usr/sbin/mysqld[0xa57a3b]
/usr/sbin/mysqld[0xa580ab]
/usr/sbin/mysqld[0xa45b1a]
/usr/sbin/mysqld[0xa98716]
/usr/sbin/mysqld[0x940e8a]
/usr/sbin/mysqld[0x72cd36]
/usr/sbin/mysqld[0x73c1fd]
/usr/sbin/mysqld(_Z14get_all_tablesP3THDP10TABLE_LISTP4Item+0x665)[0x73c975]
/usr/sbin/mysqld(_Z24get_schema_tables_resultP4JOIN23enum_schema_table_state+0x2cd)[0x72942d]
/usr/sbin/mysqld(_ZN4JOIN14prepare_resultEPP4ListI4ItemE+0x6d)[0x71c2ad]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0xfd)[0x6d727d]
/usr/sbin/mysqld[0x71ee39]
/usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_P10SQL_I_ListI8st_orderESB_S7_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xbc)[0x71f8fc]
/usr/sbin/mysqld(_Z13handle_selectP3THDP13select_resultm+0x175)[0x71fb05]
/usr/sbin/mysqld[0x6f9929]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x34ae)[0x6fe01e]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x338)[0x701d48]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1231)[0x703761]
/usr/sbin/mysqld(_Z10do_commandP3THD+0xd7)[0x705037]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x116)[0x6cb956]
/usr/sbin/mysqld(handle_one_connection+0x45)[0x6cba35]
/usr/sbin/mysqld(pfs_spawn_thread+0x126)[0xaf56f6]
/lib64/libpthread.so.0(+0x7aa1)[0x2aaf7afedaa1]
/lib64/libc.so.6(clone+0x6d)[0x2aaf7c2d2aad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (2ab054008b90): is an invalid pointer
Connection ID (thread ID): 1435
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
170210 02:39:48 mysqld_safe Number of processes running now: 0
170210 02:39:48 mysqld_safe mysqld restarted
2017-02-10 02:39:49 0 [Note] /usr/sbin/mysqld (mysqld 5.6.29) starting as process 5207 ...
2017-02-10 02:39:50 5207 [Note] Plugin 'FEDERATED' is disabled.
2017-02-10 02:39:50 5207 [Note] InnoDB: Using atomics to ref count buffer pool pages
2017-02-10 02:39:50 5207 [Note] InnoDB: The InnoDB memory heap is disabled
2017-02-10 02:39:50 5207 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-02-10 02:39:50 5207 [Note] InnoDB: Memory barrier is not used
2017-02-10 02:39:50 5207 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-02-10 02:39:50 5207 [Note] InnoDB: Using Linux native AIO
2017-02-10 02:39:50 5207 [Note] InnoDB: Using CPU crc32 instructions
2017-02-10 02:39:50 5207 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2017-02-10 02:39:50 5207 [Note] InnoDB: Completed initialization of buffer pool
2017-02-10 02:39:50 5207 [Note] InnoDB: Highest supported file format is Barracuda.
2017-02-10 02:39:50 5207 [Note] InnoDB: The log sequence numbers 98028365051 and 98028365051 in ibdata files do not match the log sequence number 101119859781 in the ib_logfiles!
2017-02-10 02:39:50 5207 [Note] InnoDB: Database was not shutdown normally!
2017-02-10 02:39:50 5207 [Note] InnoDB: Starting crash recovery.
2017-02-10 02:39:50 5207 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-02-10 02:40:35 5207 [Note] InnoDB: Restoring possible half-written data pages
2017-02-10 02:40:36 5207 [Note] InnoDB: 128 rollback segment(s) are active.
2017-02-10 02:40:36 5207 [Note] InnoDB: Waiting for purge to start
2017-02-10 02:40:36 5207 [Note] InnoDB: 5.6.29 started; log sequence number 101119859781
2017-02-10 02:40:36 5207 [Note] Server hostname (bind-address): '*'; port: 3306
2017-02-10 02:40:36 5207 [Note] IPv6 is available.
2017-02-10 02:40:36 5207 [Note]   - '::' resolves to '::';
2017-02-10 02:40:36 5207 [Note] Server socket created on IP: '::'.
2017-02-10 02:40:37 5207 [Note] Event Scheduler: Loaded 1 event
2017-02-10 02:40:37 5207 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.29'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)
2017-02-10 02:40:37 5207 [Note] Event Scheduler: scheduler thread started with id 1

my.cnf

I ran the Percona Config Wizard to generate most of the configuration file. Not entirely sure why it contains both innodb_file_per_table and innodb-file-per-table but I've left it for now.

[mysqld]
event_scheduler=on
local-infile=0
innodb_file_per_table = 1
max_allowed_packet = 256M
explicit_defaults_for_timestamp

# PERCONA WIZARD START #
key-buffer-size = 32M

# Caches/limits
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 300
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 10240

innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 64M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = 128M
# PERCONA WIZARD END #

Also, down below's the rule for security/limits.conf. The number is higher than open-files-limit to give it some headroom, just in case.

mysql            -       nofile          81920

Any ideas on how to resolve this?

Best Answer

I actually figured it out somewhere in 2017, but I forgot to put up an answer. Now, since it's been some time I probably will forget a few details, but the following description should be pretty accurate nonetheless.

In order to further troubleshoot the issue we decided to use Percona Monitoring and Management. I also enabled the standard slow query log. After a couple days of data collection we saw a couple of things:

  • The InnoDB buffer pool was doing a lot of pages in both directions
  • Many pending read/writes
  • For one particular database a lot of queries were reported as being slow (Magento website)

So, we doubled the server memory (to 16 GB) and significantly increased the InnoDB buffer pool size (from 128 MB to 6 GB) along with a few other buffers. Then we had to make sure there wasn't any corruption, for that we "simply" set innodb_force_recovery = 4, dumped and dropped all tables, restarted MySQL in normal mode and imported it all back. Keep in mind that a recovery level of >= 4 can permanently corrupt data though. For us it would still crash on level 3, hence going for 4.

The updated my.cnf (some directives may have been added way later, but they should be unrelated to this particular issue):

[mysqld]
event_scheduler = on
local-infile = 0
skip-host-cache
symbolic-links = 0
character_set_server = utf8
explicit_defaults_for_timestamp

sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
concurrent_insert = ALWAYS
low_priority_updates = 1
log-queries-not-using-indexes = 1

max_connections = 128
max_allowed_packet = 512M

read_rnd_buffer_size = 16M
join_buffer_size = 4M
sort_buffer_size = 8M

query_cache_type = 1
query_cache_size = 128M
query_cache_limit = 128M
max_heap_table_size = 512M
thread_cache_size = 8192

table_definition_cache = 4096
table_open_cache = 8192
tmp_table_size = 512M
max_tmp_tables = 4096

innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
innodb_thread_concurrency = 8
innodb_read_io_threads = 32
innodb_write_io_threads = 32

innodb_log_file_size = 256M
innodb_buffer_pool_size = 6G
innodb_log_buffer_size = 256M

innodb_monitor_enable = all
performance_schema = ON

### Tweaks for SSDs
# Default = 200
innodb_io_capacity = 3000

# Default = 2000
innodb_io_capacity_max = 6000

As you can see some variables were actually decreased (such as max_connections) while buffers generally saw an increase. Since the server can now work (almost) entirely from RAM any connections and their queries should be handled fast enough that it doesn't become a problem. Many numbers here really are dependent on what the server's specs are, what sort of workload it has to deal with and if it's dedicated to MySQL or not. Many people seem to recommend about 80% RAM for the buffer pool on dedicated servers, but ours is not one of those and it's not that busy. We can currently get away with a bit less than 40%, at some point we may have to bump this to 60% but at least we have enough RAM to spare at the moment.

To me it's odd that MySQL decided to hard-abort, which likely causes corruption somewhere (and I seem to remember it actually did a few times). Perhaps that's simply because the MySQL version was already fairly old at the time and that problem's already been fixed in some later version.