Mysql Crash SIGSEGV signal 11, recovered after killing apport

crashMySQLmysql-5.6

My mysql instance (ver 5.6.33) on AWS ( r3.4xlarge CPU Cores : 16 , RAM : 120 GB
, filesystem : XFS ) is crashing with signal 11 .
The instance is EBS optimised with EBS for storage.

At around 9:04 IST, the apport began writing the crash report. It did not finish till 9:29 where we killed apport ( kill -9 46697) and mysql respawned on its own.

PFA the top when we were about to kill apport.
enter image description here

PFA cpu and available memory graphs, also active number of connections.
enter image description here
enter image description here
enter image description here

Additional Info

#uname -a
Linux 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:   trusty

Crash dump in /var/crash/

ProblemType: Crash
Architecture: amd64
Date: Mon Nov 21 09:04:26 2016
DistroRelease: Ubuntu 14.04
ExecutablePath: /usr/sbin/mysqld
ExecutableTimestamp: 1474318064
ProcCmdline: /usr/sbin/mysqld
ProcCwd: /var/lib/mysql
ProcEnviron:
 PATH=(custom, no user)
 TERM=linux
ProcMaps:
.... --- 

ProcStatus:
Name:  mysqld
State: D (disk sleep)
Tgid:  1591
Ngid:  0
Pid:   1591
PPid:  1
TracerPid:     0
Uid:   109     109     109     109
Gid:   116     116     116     116
FDSize:        16384
Groups:        116
VmPeak:        104777096 kB
VmSize:        104214224 kB
VmLck:        0 kB
VmPin:        0 kB
VmHWM: 98602088 kB
VmRSS: 98469508 kB
VmData:        104160476 kB
VmStk:      136 kB
VmExe:    14068 kB
VmLib:     4564 kB
VmPTE:   197320 kB
VmSwap:               0 kB
Threads:       3363
SigQ:  7/983541
SigPnd:        0000000000000000
ShdPnd:        0000000000000000
SigBlk:        0000000000087007
SigIgn:        0000000000001000
SigCgt:        00000001800062e9
CapInh:        0000000000000000
CapPrm:        0000000000000000
CapEff:        0000000000000000
CapBnd:        0000001fffffffff
Seccomp:       0
Cpus_allowed:  ffffffff,ffffffff,ffffffff,ffffffff
Cpus_allowed_list:     0-127
Mems_allowed:  00000000,00000001
Mems_allowed_list:     0
voluntary_ctxt_switches:       4249548
nonvoluntary_ctxt_switches:    168757
Signal: 11
Uname: Linux 3.13.0-92-generic x86_64

Mysql ERROR log didn't have any crash report ( NO LOGS on 9:04 )

2016-11-21 09:29:51 47911 [Note] Failed to execute mysql_file_stat on file '/var/log/mysql/mysql-bin.006775'
2016-11-21 09:29:51 47911 [Note] Plugin 'FEDERATED' is disabled.
2016-11-21 09:29:51 47911 [ERROR] Function 'innodb' already exists
2016-11-21 09:29:51 47911 [Warning] Couldn't load plugin named 'innodb' with soname 'ha_innodb.so'.
2016-11-21 09:29:51 47911 [ERROR] Function 'federated' already exists
2016-11-21 09:29:51 47911 [Warning] Couldn't load plugin named 'federated' with soname 'ha_federated.so'.
2016-11-21 09:29:51 47911 [ERROR] Function 'blackhole' already exists
2016-11-21 09:29:51 47911 [Warning] Couldn't load plugin named 'blackhole' with soname 'ha_blackhole.so'.
2016-11-21 09:29:51 47911 [ERROR] Function 'archive' already exists
2016-11-21 09:29:51 47911 [Warning] Couldn't load plugin named 'archive' with soname 'ha_archive.so'.
2016-11-21 09:29:51 47911 [Note] InnoDB: Using atomics to ref count buffer pool pages
2016-11-21 09:29:51 47911 [Note] InnoDB: The InnoDB memory heap is disabled
2016-11-21 09:29:51 47911 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-11-21 09:29:51 47911 [Note] InnoDB: Memory barrier is not used
2016-11-21 09:29:51 47911 [Note] InnoDB: Compressed tables use zlib 1.2.8
2016-11-21 09:29:51 47911 [Note] InnoDB: Using Linux native AIO
2016-11-21 09:29:51 47911 [Note] InnoDB: Using CPU crc32 instructions
2016-11-21 09:29:51 47911 [Note] InnoDB: Initializing buffer pool, size = 60.0G
2016-11-21 09:29:53 47911 [Note] InnoDB: Completed initialization of buffer pool
2016-11-21 09:29:54 47911 [Note] InnoDB: Highest supported file format is Barracuda.
2016-11-21 09:29:54 47911 [Note] InnoDB: Log scan progressed past the checkpoint lsn 22930589928913
2016-11-21 09:29:54 47911 [Note] InnoDB: Database was not shutdown normally!
2016-11-21 09:29:54 47911 [Note] InnoDB: Starting crash recovery.
2016-11-21 09:29:54 47911 [Note] InnoDB: Reading tablespace information from the .ibd files...
2016-11-21 09:29:56 47911 [Note] InnoDB: Restoring possible half-written data pages
2016-11-21 09:29:56 47911 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 22930595171328
InnoDB: Doing recovery: scanned up to log sequence number 22930600414208
InnoDB: Doing recovery: scanned up to log sequence number 22930605657088
InnoDB: Doing recovery: scanned up to log sequence number 22930610899968
InnoDB: Doing recovery: scanned up to log sequence number 22930616142848
InnoDB: Doing recovery: scanned up to log sequence number 22930621385728
InnoDB: Doing recovery: scanned up to log sequence number 22930626628608
InnoDB: Doing recovery: scanned up to log sequence number 22930631871488
InnoDB: Doing recovery: scanned up to log sequence number 22930637114368
InnoDB: Doing recovery: scanned up to log sequence number 22930642357248
InnoDB: Doing recovery: scanned up to log sequence number 22930647600128
InnoDB: Doing recovery: scanned up to log sequence number 22930652843008
InnoDB: Doing recovery: scanned up to log sequence number 22930658085888
InnoDB: Doing recovery: scanned up to log sequence number 22930663328768
InnoDB: Doing recovery: scanned up to log sequence number 22930668571648
InnoDB: Doing recovery: scanned up to log sequence number 22930673814528
InnoDB: Doing recovery: scanned up to log sequence number 22930679057408
InnoDB: Doing recovery: scanned up to log sequence number 22930684300288
InnoDB: Doing recovery: scanned up to log sequence number 22930689543168
InnoDB: Doing recovery: scanned up to log sequence number 22930694786048
InnoDB: Doing recovery: scanned up to log sequence number 22930700028928
InnoDB: Doing recovery: scanned up to log sequence number 22930705271808
InnoDB: Doing recovery: scanned up to log sequence number 22930710514688
InnoDB: Doing recovery: scanned up to log sequence number 22930715757568
InnoDB: Doing recovery: scanned up to log sequence number 22930721000448
InnoDB: Doing recovery: scanned up to log sequence number 22930726243328
InnoDB: Doing recovery: scanned up to log sequence number 22930731486208
InnoDB: Doing recovery: scanned up to log sequence number 22930736729088
InnoDB: Doing recovery: scanned up to log sequence number 22930741971968
InnoDB: Doing recovery: scanned up to log sequence number 22930747214848
InnoDB: Doing recovery: scanned up to log sequence number 22930752457728
InnoDB: Doing recovery: scanned up to log sequence number 22930757700608
InnoDB: Doing recovery: scanned up to log sequence number 22930762943488
InnoDB: Doing recovery: scanned up to log sequence number 22930763892518
InnoDB: Transaction 48125747425 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 48125747712
2016-11-21 09:30:02 47911 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2

SYSLOG at that time

    Nov 21 09:05:01 localhost CRON[46747]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
    Nov 21 09:15:01 localhost CRON[47316]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
    Nov 21 09:17:01 localhost CRON[47334]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
    Nov 21 09:19:37 localhost dhclient: DHCPREQUEST of 10.0.42.189 on eth0 to 10.0.40.1 port 67 (xid=0x2441a7cd)
    Nov 21 09:19:37 localhost dhclient: DHCPACK of 10.0.42.189 from 10.0.40.1
    Nov 21 09:19:37 localhost dhclient: bound to 10.0.42.189 -- renewal in 1592 seconds.
    Nov 21 09:25:01 localhost CRON[47648]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
    Nov 21 09:29:50 localhost kernel: [2888896.849170] init: mysql main process (1591) killed by SEGV signal
    Nov 21 09:29:50 localhost kernel: [2888896.849184] init: mysql main process ended, respawning
    Nov 21 09:29:50 localhost kernel: [2888896.886594] audit_printk_skb: 18 callbacks suppressed
    Nov 21 09:29:50 localhost kernel: [2888896.886598] type=1400 audit(1479700790.336:18): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/mysqld" pid=47900 comm="apparmor_parser"
    Nov 21 09:30:12 localhost /etc/mysql/debian-start[48231]: Upgrading MySQL tables if necessary.
    Nov 21 09:30:12 localhost /etc/mysql/debian-start[48235]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored
    Nov 21 09:30:12 localhost /etc/mysql/debian-start[48235]: Looking for 'mysql' as: /usr/bin/mysql
    Nov 21 09:30:12 localhost /etc/mysql/debian-start[48235]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
    Nov 21 09:30:12 localhost /etc/mysql/debian-start[48235]: Error: Failed while fetching Server version! Could be due to unauthorized access.
    Nov 21 09:30:12 localhost /etc/mysql/debian-start[48235]: FATAL ERROR: Upgrade failed
    Nov 21 09:30:12 localhost /etc/mysql/debian-start[48245]: Checking for insecure root accounts.

my.cnf file

[client]
port        = 3306
socket      = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket      = /var/run/mysqld/mysqld.sock
nice        = 0

[mysqld]
server-id   = 42189
user        = mysql
pid-file    = /var/run/mysqld/mysqld.pid
socket      = /var/run/mysqld/mysqld.sock
port        = 3306
basedir     = /usr
datadir     = /var/lib/mysql
tmpdir      = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking

max-connect-errors = 100000

key_buffer_size     = 30M
max_allowed_packet  = 16M
thread_stack        = 192K
thread_cache_size       = 8

myisam-recover-options  = BACKUP

query_cache_type        = 2
query_cache_limit   = 2M
query_cache_size        = 64M

log_error = /var/log/mysql/error.log

slow_query_log_file  = '/var/log/mysql/slow.log'
slow_query_log       = 1
log_output           = 'FILE'
long_query_time      = 2

expire_logs_days    = 7
max_binlog_size         = 100M
binlog_format       = mixed
log-bin         = /var/log/mysql/mysql-bin.log
log_slave_updates       = 1

innodb_print_all_deadlocks=1

sql_mode            = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
innodb_buffer_pool_size     = 60G
innodb_buffer_pool_instances    = 20
innodb_log_file_size        = 500M
innodb_log_buffer_size      = 50M
innodb_flush_log_at_trx_commit  = 2

innodb_buffer_pool_dump_at_shutdown     = 1
innodb_buffer_pool_load_at_startup      = 1

relay_log_recovery=1
relay_log_info_repository=TABLE

max_connections         = 8000
max_heap_table_size     = 64M
tmp_table_size          = 64M
sort_buffer_size        = 256K
join_buffer_size        = 512K

table_open_cache        = 4000

[mysqldump]
quick
quote-names
max_allowed_packet  = 16M

[isamchk]
key_buffer_size     = 30M

[mysqld_safe]
syslog

Update :
See comments by Michael ( https://dba.stackexchange.com/users/11651/michael-sqlbot ) in the question. We shut down apport and although we did not find a solution of SIGSEGV , apport did not make things worse.

Best Answer

A "load average" of 3000 says says

  • max_connections is set unreasonably high, and
  • the clients are asking for too many connections, and
  • there are sloooow queries

So...

  • Decrease max_connections to, say, 1000. Meanwhile provide my.cnf; there may be other things that are set unreasonably.
  • Explain what your clients are like. For example, 100 Apache clients with MaxClients=100 would threaten to make 10,000 connections -- which is not sustainable.
  • Set long_query_time=1, turn on the slowlog, wait a day, run pt-query-digest on the slowlog, and let's see the 'worst' queries, together with SHOW CREATE TABLE. It may be something as simple as adding selected composite index(es) to table(s).
  • How much data do you have? What is the setting of innodb_buffer_pool_size? It sounds like that could be increased, but it may not need to be.