MySQL INNODB corruption after server crash during concurrent truncate command

corruptioncrashindexinnodbMySQL

My server crashed today I think due to a concurrent truncate table command on one of our INNODB tables. The server could be restarted, but after it starts up, everytime I try to issue an SQL command, I get the following error:

ERROR 2006 (HY000): MySQL server has gone away

This is what happened in the logs:

121206 01:11:12  mysqld restarted
121206  1:11:13  InnoDB: Started; log sequence number 275 559321759
InnoDB: !!! innodb_force_recovery is set to 1 !!!
121206  1:11:13 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.95-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
InnoDB: Error: trying to load index PRIMARY for table 
InnoDB: but the index tree has been freed!
121206  1:11:37 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=134217728
read_buffer_size=1048576
max_used_connections=1
max_connections=400
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 950272 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x9900950
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x46353fa0, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
(nil)
New value of fp=0x9900950 failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/using-stack-trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x993e500 =
thd->thread_id=1
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

I have searched online and I get the hint it is a MySQL bug, but I have no idea how to solve it. I am using MySQL version 5.0.95.

It seems like I have to create a new database and dump the old data into the new one, but how can I do that if I can't even issue any SQL commands to the current one?

— UPDATE —
Version: '5.0.95-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution
InnoDB: Error: trying to load index PRIMARY for table
InnoDB: but the index tree has been freed!
121206 4:13:41 – mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=134217728
read_buffer_size=1048576
max_used_connections=1
max_connections=400
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 950272 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x17fb8950
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x464a3fa0, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
(nil)
New value of fp=0x17fb8950 failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/using-stack-trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x17ff6500 =
thd->thread_id=3
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
121206 04:13:41  mysqld restarted
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
121206  4:13:42  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
121206  4:13:43  InnoDB: Started; log sequence number 275 559323148
121206  4:13:43 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.95-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

Best Answer

ASPECT #1

The first thing that caught my eye was this line

InnoDB: Error: trying to load index PRIMARY for table /

This indicates you have a table using the InnoDB Storage Engine

What is interesting about InnoDB is the way a PRIMARY KEY is stored. It is stored in a structure called the gen_clust_index, or more commonly known as the Clustered Index.

My immediate guess is that a certain PRIMARY KEY entry is too big

Please consider some articles on the good, the bad, and the ugly of using long PRIMARY KEYs:

then see if the <DB Hidden>.<Table Hidden> needs to be redesigned.

ASPECT #2

In terms of your conjecture concerning a parallel truncate table, that sounds kind of dangerous. Why? InnoDB performs TRUNCATE TABLE as DDL not DML. I have written about this before:

Jul 09, 2012 : What can cause TRUNCATE TABLE to take a really long time?
Jan 17, 2012 : Problem with InnoDB "per table" file sizes
Sep 28, 2011 : How to Recover an InnoDB table whose files were moved around

ASPECT #3

Some tuning suggestions

Please add the following to my.ini

[mysqld]
max_allowed_packet=1G
innodb_fast_shutdown=0

Start mysql

In another session, run tail -f <errorlogfile> and watch InnoDB Crash Recovery.

If mysql is fully started back up and InnoDB crash recovery has completed, try to shut mysql down immediately. You may need to resize your InnoDB Transaction Logs.

Sorry for these wild suggestions, but I am flying blind here.

Please post the following in the question:

your entire my.cnf
how much RAM is on board

UPDATE 2012-12-05 12:09 EDT

Please do the following:

STEP 01) Add these changes to my.cnf

[mysqld]
max_allowed_packet=1G
innodb_fast_shutdown=0
innodb_thread_concurrency=0

STEP 02) service mysql restart

to make sure mysql comes up

STEP 03) You need to resize ib_logfile0 and ib_logfile1 (24M might be too small)

service mysql stop
cd /var/lib/mysql
mv ib_logfile0 ib_logfile0.bak
mv ib_logfile1 ib_logfile1.bak

STEP 04) Add these changes to my.cnf

[mysqld]
innodb_log_file_size=512M
innodb_log_buffer_size=8M

STEP 05) service mysql start

mysqld will recreate ib_logfile0 and ib_logfile1 512M each

Now, try and see what happens....

UPDATE 2012-12-05 12:18 EDT

In the meantime, please read my ServerFault post on the mysql packet and its sizing implication with regard to the innodb_log_file_size and innodb_log_buffer_size as I learned from someone else's ServerFault post.

UPDATE 2012-12-05 14:28 EDT

I edited all references to customer tables out of this question.

The root cause was a damaged page in ibdata1 with data and index pages mixed inside. I helped Andrew migrate data out, recreate ibdata1 with innodb_file_per_table, and Andrew reloaded the data.

Related Solutions

MySQL master binlog corruption

Surprisingly, that's not gibberish.

That indeed appears at the top of binlogs whenever you do mysqlbinlog to a binary log generated using MySQL 5.1 and MySQL 5.5. You will not see that gibberish in binary logs for MySQL 5.0 and back.

This is why the start point for replication from an empty binary log is

107 for MySQL 5.5
106 for MySQL 5.1
98 for MySQL 5.0 and back

This is good to remember if you do MySQL Replication where the Master if MySQL 5.1 and the slave is MySQL 5.0. This could present a really big headache.

Replication from Master using 5.0 and Slave using 5.1 works fine, not the other way around.(According to MySQL Documentation, it is generally not supported for 3 reasons: 1) Binary Log Format, 2) Row-based Replication, 3) SQL Incompatibility).

Anyway, do a mysqlbinlog on the offending binary log on the master. If the resulting dump produces gibberish in the middle of the dump (which I have seen a couple of times in my DBA career) you may have to skip to position 98 (MySQL 5.0) or 106 (MySQL 5.1) or 107 (MySQL 5.5) of the master's next binary log and start replicating from there (SOB :( you may need to use MAATKIT tools mk-table-checksum and mk-table-sync to reload master changes not on the slave [if you want to be a hero]; even worse, mysqldump the master and reload the slave and start replication totally over [if you don't want to be a hero])

If the mysqlbinlog of the master is completely readable after the top gibberish you saw, it is possible the master's binary log is fine but the relay log on the slave is corrupt (due to transmission/CRC errors). If that's the case, just reload the relay logs by issuing the CHANGE MASTER TO command as follows:

STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST='< master-host ip or DNS >',
MASTER_PORT=3306,
MASTER_USER='< usernmae >',
MASTER_PASSWORD='< password >',
MASTER_LOG_FILE='< MMMM >',
MASTER_LOG_POS=< PPPP >;
START SLAVE;

Where

MMMM is the last file used from the Master that was last processed on the Slave
PPPP is the last position used from the Master that was last processed on the Slave

You can get MMMM and PPPP by doing SHOW SLAVE STATUS\G and using

Relay_Master_Log_File for MMMM
Exec_Master_Log_Pos for PPPP

Try it out and let me know !!!

BTW running CHANGE MASTER TO command erases the slave's current relay logs and starts fresh.

Thesqld_safe version different than thesqld

As we all know, mysqld_safe and mysqld are very different

mysqld : The database server instance daemon

mysqld_safe : Control program that examines and sets the environment for mysqld to execute. The mysqld executable is actually launched in a loop. When mysqld terminates, the mysqld_safe program will examine the return results and decide whether

mysqld terminated normally (intentional shutdown), leaves mysqld_safe
mysqld terminated abnormally (crash or kill -9 of mysqld)
- Loop back, mysqld fails on retry, leaves mysqld_safe
- Loop back, mysqld starts up, stays in the mysqld_safe loop

Why is it important to have mysqld and mysqld_safe using the same MySQL version?

Let me illustrate it this way: Percona Server sometimes has additional features in mysqld_safe for manipulating the OS. For example, I have seen numactl --interleave=all in a Percona Server mysqld_safe. If that line was not there, the mysqld for Percona Server may run into issues with memory and swapping.

The same scenario could possibly be the case for Oracle's (ugh, still hate saying that) mysqld and mysqld_safe. There could be improvements from one major release to another that would be removed if the mysqld_safe was older.

Rather than exploring the possibilities of using a old mysqld_safe and a new mysqld (or vica versa), please make your life simple and reinstall MySQL 5.5.30 from scratch.

Before doing so, please run

updatedb
locate mysqld_safe

in Linux and see if there are two lingering. If there are, get the paths straightened out. Otherwise, you may have to reinstall MySQL 5.5.30.