Surprisingly, that's not gibberish.
That indeed appears at the top of binlogs whenever you do mysqlbinlog to a binary log generated using MySQL 5.1 and MySQL 5.5. You will not see that gibberish in binary logs for MySQL 5.0 and back.
This is why the start point for replication from an empty binary log is
- 107 for MySQL 5.5
- 106 for MySQL 5.1
- 98 for MySQL 5.0 and back
This is good to remember if you do MySQL Replication where the Master if MySQL 5.1 and the slave is MySQL 5.0. This could present a really big headache.
Replication from Master using 5.0 and Slave using 5.1 works fine, not the other way around.(According to MySQL Documentation, it is generally not supported for 3 reasons: 1) Binary Log Format, 2) Row-based Replication, 3) SQL Incompatibility).
Anyway, do a mysqlbinlog on the offending binary log on the master. If the resulting dump produces gibberish in the middle of the dump (which I have seen a couple of times in my DBA career) you may have to skip to position 98 (MySQL 5.0) or 106 (MySQL 5.1) or 107 (MySQL 5.5) of the master's next binary log and start replicating from there (SOB :( you may need to use MAATKIT tools mk-table-checksum and mk-table-sync to reload master changes not on the slave [if you want to be a hero]; even worse, mysqldump the master and reload the slave and start replication totally over [if you don't want to be a hero])
If the mysqlbinlog of the master is completely readable after the top gibberish you saw, it is possible the master's binary log is fine but the relay log on the slave is corrupt (due to transmission/CRC errors). If that's the case, just reload the relay logs by issuing the CHANGE MASTER TO command as follows:
STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST='< master-host ip or DNS >',
MASTER_PORT=3306,
MASTER_USER='< usernmae >',
MASTER_PASSWORD='< password >',
MASTER_LOG_FILE='< MMMM >',
MASTER_LOG_POS=< PPPP >;
START SLAVE;
Where
- MMMM is the last file used from the Master that was last processed on the Slave
- PPPP is the last position used from the Master that was last processed on the Slave
You can get MMMM and PPPP by doing SHOW SLAVE STATUS\G
and using
- Relay_Master_Log_File for MMMM
- Exec_Master_Log_Pos for PPPP
Try it out and let me know !!!
BTW running CHANGE MASTER TO command erases the slave's current relay logs and starts fresh.
It seems theoretically possible that table could still dump properly if the corruption were in the indexes, which aren't dumped.
It should not be possible for anything in your configuration to cause MySQL to crash with a Signal 11, a segmentation fault.
I've been staring at this for a while, now, and I haven't come up with answers... just questions (in no particular order):
- have you run memory diagnostics on your server? You mentioned that you "tried to move hardware" but you also mention having not tried a restore of your dump, so I'm not clear exactly what you tried moving. Resist the temptation to think "it can't be that." Test the memory.
- is your system using any swap space at all? Hopefully not -- but if (and only if) it is, then you should reduce the innodb_buffer_pool_size to the point that it isn't ... because there's not really a point in buffering to memory that gets swapped, and the swap partition could be introducing problems. This one is a stretch, but worth eliminating, I think.
- is this a problem that occurred after an upgrade to 5.5.28 or is this a new application or deployment?
- if it's new, have you tried replicating the problem with MySQL 5.6?
- is partitioning involved? That means touching more code.
- are you using a binary distribution of MySQL that you downloaded from Oracle (tar/deb/rpm)? Or is it from Ubuntu (I always use generic tar binaries, so I don't know what the current version of MySQL 5.5 is, in 12.04LTS) or another source? Or compiled from source code?
- are you using any unusual plugins or UDFs?
This could be a bug, but when you hear the sound of hooves, suspect horses before zebras (at least where I come from).
update (from comments):
"Another" memory bug?
Checking the memory would be the first thing I would try, for sure.
The snapshots should be getting you a reliable backup, I agree, but if there's any kind of binary wierdness going on in your files, it would be perfectly replicated. It will take some time, but restoring to a fresh system using mysqldump files would be a better test, since all of the table structures would all be absolutely rebuilt from scratch. Since the table structures seem to be valid, it may be unlikely that this will change anything, but it feels like you're kind of at the point where every possibility needs to be pinned down... clearly, what you're seeing should not be happening.
For a new test system, though, I would install the server using the "Linux - Generic 2.6 (x86, 64-bit), Compressed TAR Archive" package from the download site. Download the tarball, verify it's md5 checksum, then tar xvzf it into /usr/local and symlink the resulting directory to /usr/local/mysql. (I think Ubuntu still puts it in /var/lib/mysql, so you can probably do this even without removing the distro version, as long as you don't have the other copy running). Then move the "data" directory from inside /usr/local/mysql to whatever partition it needs to live on (if different), and symlink it back into /usr/local/mysql/data. Put your config file at /usr/local/mysql/my.cnf and pass that as the first option ... using --defaults-file=/usr/local/mysql/my.cnf when using the install scripts and when starting the server -- this will cause any other my.cnf's (such as those in /etc) to not be read.
The rest of the setup is pretty straightforward. It's more work, but it completely eliminates the "black box" of using the package manager. The real motivation here, though, is that the disto packages may have been compiled from source, and the resulting binaries could have slight variations from the "official" Oracle binaries.
Best Answer
Let's elaborate the error message
InnoDB calles an .ibd file a "table space" or just "space" in this error message. So, there is a space 89 and one of its pages refers to a non-existing page number 197105. The error message doesn't say what table corresponds to the space 89. I think some MySQL version tell that, but it's possible to find from the InnoDB dictionary.
Now, as we know what table is corrupt the question is how to fix it. InnoDB doesn't heal tablespaces (like MyISAM does with
REPAIR TABLE
), the only way is to drop the table space(=table) and re-create it.In this particular case it's not necessary to start MySQL in
innodb_force_recovery
mode. Just start it normally and drop the table. But first, you probably need to save the data in this table.There are two ways. First, you can choose ranges of non-corrupt primary key values and dump them into another table or a text dump. I had recovery cases like this one and wrote a script to do that. The script iterates over the PK ranges and dumps as many records as possible into another (MyISAM) table.
Second way is requires more time and skills. You will need a tool (https://github.com/twindb/undrop-for-innodb) and a blog post https://twindb.com/recover-corrupt-mysql-database/ with steps.
Then drop the corrupted table, create an empty one and load the recovered dump.