MySQL: Lock wait timeout exceeded

innodbMySQL

I have a developer that has been trying to alter a large table (~60M rows). Via LiquidBase/JDBC they're running

ALTER TABLE foo DROP FOREIGN KEY fk_foo_1;

Today while it was running I checked in on it periodically; everything looked normal, the query was running, in state "copying to tmp table", I could see the temp table on disk getting larger and larger (still plenty of free space). I could see other processes blocked waiting for this table to be unlocked. Finally after about 4.5 hours, they got the "lock wait timeout exceeded; try restarting transaction" error. This is actually the 2nd time they've tried, and it seems to fail right about when I would expect it to complete.

innodb_lock_wait_timeout is set to the default 50, I can't imagine it would run for so long to be affected by this. No errors logged, no deadlocks or other weirdness seen in 'show engine innodb status'. Can anyone help me with other ideas? I'm fairly stumped on this one.

thanks

Best Answer

This is definitely a head-scratcher, but here's one possibility, especially if you're running MySQL 5.1 or earlier. MySQL Bug #37346 addresses an issue where there are locks at two different layers that are (at least partially) unaware of each other's existence... an uncommitted transaction is holding locks inside the storage engine (invisible to MySQL's core), while the ALTER TABLE is holding a different kind of lock, outside the storage engine (invisible to InnoDB).

In the bug report, when the ALTER TABLE is almost done, it bumps into the inner locks and ultimately times out with the message you see.

This is referenced as "fixed" in the release notes for 5.5.3 but I didn't find it mentioned in 5.1.x, which suggests to me that it might still exist there. It seems like a long shot but it did seem worth mentioning.


Until I found the above, my initial thoughts had to do with the table that is referenced by fk_foo_1... I was going to speculate that InnoDB might be trying something unnecessary related to validation of the foreign key constraint (unnecessary, since the constraint is about to disappear)... or that an uncommitted update or delete on the referenced table might be setting locks on the table you're trying to alter... an update or delete "over there," needing to be validated or trying to cascade "over here," while the ALTER TABLE is underway, and not waiting as it should for the lock held by the ALTER TABLE.