Mysql – Percona Server 5.6.40 restarting with singal 11

mysql-5.6perconapercona-server

We recently migrated our mysql to new hardware and it was running in slave mode for 15 days. We made it the master 11th June. On 13th June it restarted for the first time with signal 11.
Stacktrace from 1st segfault –

04:32:32 UTC – mysqld got signal 11 ; This could be because you hit a
bug. It is also possible that this binary or one of the libraries it
was linked against is corrupt, improperly built, or misconfigured.
This error can also be caused by malfunctioning hardware. We will try
our best to scrape up some info that will hopefully help diagnose the
problem, but since we have already crashed, something is definitely
wrong and this may fail. Please help us make Percona Server better by
reporting any bugs at http://bugs.percona.com/

key_buffer_size=33554432 read_buffer_size=131072
max_used_connections=547 max_threads=5002 thread_count=430
connection_count=430 It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
2023198 K bytes of memory Hope that's ok; if not, decrease some
variables in the equation.

Thread pointer: 0x2a83900 Attempting backtrace. You can use the
following information to find out where mysqld died. If you see no
messages after this, something went terribly wrong… stack_bottom =
7f86b8c24e88 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8c66bc]
/usr/sbin/mysqld(handle_fatal_signal+0x469)[0x64d079]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f8e9871c890]
/usr/sbin/mysqld(_Z25gtid_pre_statement_checksPK3THD+0x0)[0x848820]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x316)[0x6cb8a6]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5d8)[0x6d15e8]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x117f)[0x6d2eaf]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x1a2)[0x69f962]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x69fa00]
/usr/sbin/mysqld(pfs_spawn_thread+0x146)[0x8fbfe6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064)[0x7f8e98715064]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f8e9675462d]

Trying to get some variables. Some pointers may be invalid and cause
the dump to abort. Query (7f85e012ee80): is an invalid pointer
Connection ID (thread ID): 247827 Status: NOT_KILLED

Again after 2 days mysql restarted 3 times with the following traces.
Dump 2 –

02:15:36 UTC - mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=723
max_threads=5002
thread_count=358
connection_count=358
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2023198 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2b73a90
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f0e12d45e88 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8c66bc]
/usr/sbin/mysqld(handle_fatal_signal+0x469)[0x64d079]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f15f20b9890]
/usr/sbin/mysqld[0x64b000]
/usr/sbin/mysqld(vio_io_wait+0x76)[0xb77b56]
/usr/sbin/mysqld(vio_socket_io_wait+0x18)[0xb77bf8]
/usr/sbin/mysqld(vio_read+0xca)[0xb77cda]
/usr/sbin/mysqld[0x642203]
/usr/sbin/mysqld[0x6424f4]
/usr/sbin/mysqld(my_net_read+0x304)[0x6432e4]
/usr/sbin/mysqld(_Z10do_commandP3THD+0xca)[0x6d413a]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x1a2)[0x69f962]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x69fa00]
/usr/sbin/mysqld(pfs_spawn_thread+0x146)[0x8fbfe6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064)[0x7f15f20b2064]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f15f00f162d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 40900
Status: NOT_KILLED

Dump 3 –

02:36:32 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=401
max_threads=5002
thread_count=369
connection_count=369
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2023198 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x32448f0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f2fb82c3e88 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8c66bc]
/usr/sbin/mysqld(handle_fatal_signal+0x469)[0x64d079]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f3792426890]
/usr/sbin/mysqld(_ZN9PROFILING15start_new_queryEPKc+0x0)[0x6e60a0]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x47)[0x6d1d77]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x1a2)[0x69f962]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x69fa00]
/usr/sbin/mysqld(pfs_spawn_thread+0x146)[0x8fbfe6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064)[0x7f379241f064]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f379045e62d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 482
Status: NOT_KILLED

After these restarts we did a master slave switch and took this box out of active cluster. We were running simple sysbench oltp_read_write test to see if it happens again. 2 days after starting the benchmark it is happened again on the same machine with trace –

10:51:18 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=4
max_threads=5002
thread_count=3
connection_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2023198 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x22c8270
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f65ec060e88 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8c66bc]
/usr/sbin/mysqld(handle_fatal_signal+0x469)[0x64d079]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f6644371890]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x0)[0x7f66423a0ac0]
/usr/sbin/mysqld(vio_io_wait+0x86)[0xb77b66]
/usr/sbin/mysqld(vio_socket_io_wait+0x18)[0xb77bf8]
/usr/sbin/mysqld(vio_read+0xca)[0xb77cda]
/usr/sbin/mysqld[0x642203]
/usr/sbin/mysqld[0x6424f4]
/usr/sbin/mysqld(my_net_read+0x304)[0x6432e4]
/usr/sbin/mysqld(_Z10do_commandP3THD+0xca)[0x6d413a]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x1a2)[0x69f962]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x69fa00]
/usr/sbin/mysqld(pfs_spawn_thread+0x146)[0x8fbfe6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064)[0x7f664436a064]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f66423a962d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 32
Status: NOT_KILLED

Logs from sysbench –

FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'SELECT SUM(k) FROM sbtest20 WHERE id BETWEEN 5008643 AND 5008742'
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'DELETE FROM sbtest4 WHERE id=5025943'
FATAL: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'SELECT SUM(k) FROM sbtest15 WHERE id BETWEEN 5049412 AND 5049511'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:432: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:487: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:432: SQL error, errno = 2013, state = 'HY000': Lost connection to MySQL server during query
Error in my_thread_global_end(): 3 threads didn't exit

Our new master also restarted today with similar trace.

Can someone help us debug this?

We cannot enable general query log since the load is very high and it fills up the disk. We cannot deterministically reproduce this as well.
Mysql version is – 5.6.40-84.0-log
Debian version is – Linux version 3.16.0-6-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10+deb8u1) ) #1 SMP Debian 3.16.56-1+deb8u1 (2018-05-08)

Machine memory is 40GB
Innodb buffer pool is 30GB

Best Answer

Because each event had a problem with /usr/sbin/mysqld(pfs_spawn_thread, consider for your my.cnf [mysqld] section

max_connections=4000  # from 5000 until you get a handle on the problem
thread_cache_size=512  # for additional thread capacity
innodb_buffer_pool_size=25G  # from 30G to reduce RAM required for now

Disclaimer: I am the content author of website mentioned in view profile, Network profile and you will find free Utility Scripts available and contact information.