Mysql – Innodb, MySQL 5.5.28 – Segmentation Signal 11 faults, on high load. .. the.cnf file included

crashinnodbMySQLrecovery

We have a high-end server,
128GB RAM, 32 Core , Xeon, SSD RAID 10 – running Ubuntu 12.04 with MySQL 5.5.28 .
Doing random imports to large InnoDB tables, over 50+ gigs, randomly after a few hours of heavy load, mysql does a Signal 11 and crashes.

We have tried to move hardware. Doing a full dump (but not a restore yet) gives no issues.
Usually on corrupted tables, a dump would fail no?

Below is the crash log and my.cnf .

17:48:34 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=536870912
read_buffer_size=131072
max_used_connections=324
max_threads=200
thread_count=308
connection_count=308
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 965187 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fc7eb1b5040
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fadf6abfe60 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x29)[0x7fc758522759]
/usr/sbin/mysqld(handle_fatal_signal+0x483)[0x7fc7583e9ae3]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fc75713bcb0]
/usr/sbin/mysqld(+0x6671b0)[0x7fc75863a1b0]
/usr/sbin/mysqld(+0x61d6b9)[0x7fc7585f06b9]
/usr/sbin/mysqld(+0x630d12)[0x7fc758603d12]
/usr/sbin/mysqld(+0x6319c2)[0x7fc7586049c2]
/usr/sbin/mysqld(+0x631d85)[0x7fc758604d85]
/usr/sbin/mysqld(+0x626e7d)[0x7fc7585f9e7d]
/usr/sbin/mysqld(+0x633cea)[0x7fc758606cea]
/usr/sbin/mysqld(+0x6347e2)[0x7fc7586077e2]
/usr/sbin/mysqld(+0x624426)[0x7fc7585f7426]
/usr/sbin/mysqld(+0x610871)[0x7fc7585e3871]
/usr/sbin/mysqld(+0x5d4cb0)[0x7fc7585a7cb0]
/usr/sbin/mysqld(+0x5b7c9c)[0x7fc75858ac9c]
/usr/sbin/mysqld(_ZN7handler21read_multi_range_nextEPP18st_key_multi_range+0x24)[0x7fc7583e9fe4]
/usr/sbin/mysqld(_ZN18QUICK_RANGE_SELECT8get_nextEv+0x3c)[0x7fc7584a3c8c]
/usr/sbin/mysqld(+0x4e9195)[0x7fc7584bc195]
/usr/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x71)[0x7fc7582f1741]
/usr/sbin/mysqld(+0x32f025)[0x7fc758302025]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x4a5)[0x7fc758311155]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x130)[0x7fc75830d000]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x17c)[0x7fc758312f5c]
/usr/sbin/mysqld(+0x2f66b4)[0x7fc7582c96b4]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x16d8)[0x7fc7582d1118]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x10f)[0x7fc7582d5daf]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1380)[0x7fc7582d7200]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x1bd)[0x7fc75837b7ad]
/usr/sbin/mysqld(handle_one_connection+0x50)[0x7fc75837b810]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fc757133e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc756864cbd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7faa4c18e440): is an invalid pointer
Connection ID (thread ID): 2286
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
121120 12:48:48 [Note] Plugin 'FEDERATED' is disabled.
121120 12:48:48 InnoDB: The InnoDB memory heap is disabled
121120 12:48:48 InnoDB: Mutexes and rw_locks use GCC atomic builtins
121120 12:48:48 InnoDB: Compressed tables use zlib 1.2.3.4
121120 12:48:48 InnoDB: Initializing buffer pool, size = 96.0G
121120 12:48:56 InnoDB: Completed initialization of buffer pool
121120 12:48:57 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 1341738337497
121120 12:48:58  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 1341743580160
InnoDB: Doing recovery: scanned up to log sequence number 1341748823040
InnoDB: Doing recovery: scanned up to log sequence number 1341754065920
InnoDB: Doing recovery: scanned up to log sequence number 1341759308800
InnoDB: Doing recovery: scanned up to log sequence number 1341764551680
...

my.cnf:

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock


[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
skip-name-resolve
innodb_file_per_table
default_storage_engine=InnoDB

user            = mysql
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /data/mysql
tmpdir          = /tmp
skip-external-locking

key_buffer              = 512M
max_allowed_packet      = 128M
thread_stack            = 192K
thread_cache_size       = 64

myisam-recover         = BACKUP
max_connections        = 500
table_cache            = 812
table_definition_cache = 812

#query_cache_limit       = 4M
#query_cache_size        = 512M
join_buffer_size        = 512K


innodb_additional_mem_pool_size = 20M
innodb_buffer_pool_size = 96G
#innodb_file_io_threads = 4
#innodb_thread_concurrency = 12
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 1024M
innodb_log_files_in_group = 2
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 120

log_error                = /var/log/mysql/error.log

long_query_time =       5
slow_query_log  =       1
slow_query_log_file     =       /var/log/mysql/slowlog.log

[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

[mysql]

[isamchk]
key_buffer              = 16M

Memory usage, disk space is fine.
This only happens during high I/O.

Would anything in the my.cnf file be causing this issue?

tia.

Best Answer

It seems theoretically possible that table could still dump properly if the corruption were in the indexes, which aren't dumped.

It should not be possible for anything in your configuration to cause MySQL to crash with a Signal 11, a segmentation fault.

I've been staring at this for a while, now, and I haven't come up with answers... just questions (in no particular order):

  • have you run memory diagnostics on your server? You mentioned that you "tried to move hardware" but you also mention having not tried a restore of your dump, so I'm not clear exactly what you tried moving. Resist the temptation to think "it can't be that." Test the memory.
  • is your system using any swap space at all? Hopefully not -- but if (and only if) it is, then you should reduce the innodb_buffer_pool_size to the point that it isn't ... because there's not really a point in buffering to memory that gets swapped, and the swap partition could be introducing problems. This one is a stretch, but worth eliminating, I think.
  • is this a problem that occurred after an upgrade to 5.5.28 or is this a new application or deployment?
  • if it's new, have you tried replicating the problem with MySQL 5.6?
  • is partitioning involved? That means touching more code.
  • are you using a binary distribution of MySQL that you downloaded from Oracle (tar/deb/rpm)? Or is it from Ubuntu (I always use generic tar binaries, so I don't know what the current version of MySQL 5.5 is, in 12.04LTS) or another source? Or compiled from source code?
  • are you using any unusual plugins or UDFs?

This could be a bug, but when you hear the sound of hooves, suspect horses before zebras (at least where I come from).


update (from comments):

"Another" memory bug?

Checking the memory would be the first thing I would try, for sure.

The snapshots should be getting you a reliable backup, I agree, but if there's any kind of binary wierdness going on in your files, it would be perfectly replicated. It will take some time, but restoring to a fresh system using mysqldump files would be a better test, since all of the table structures would all be absolutely rebuilt from scratch. Since the table structures seem to be valid, it may be unlikely that this will change anything, but it feels like you're kind of at the point where every possibility needs to be pinned down... clearly, what you're seeing should not be happening.

For a new test system, though, I would install the server using the "Linux - Generic 2.6 (x86, 64-bit), Compressed TAR Archive" package from the download site. Download the tarball, verify it's md5 checksum, then tar xvzf it into /usr/local and symlink the resulting directory to /usr/local/mysql. (I think Ubuntu still puts it in /var/lib/mysql, so you can probably do this even without removing the distro version, as long as you don't have the other copy running). Then move the "data" directory from inside /usr/local/mysql to whatever partition it needs to live on (if different), and symlink it back into /usr/local/mysql/data. Put your config file at /usr/local/mysql/my.cnf and pass that as the first option ... using --defaults-file=/usr/local/mysql/my.cnf when using the install scripts and when starting the server -- this will cause any other my.cnf's (such as those in /etc) to not be read.

The rest of the setup is pretty straightforward. It's more work, but it completely eliminates the "black box" of using the package manager. The real motivation here, though, is that the disto packages may have been compiled from source, and the resulting binaries could have slight variations from the "official" Oracle binaries.

Related Question