Preface
Our application runs several threads that execute DELETE
queries in parallel. The queries affect isolated data, i.e. there should be no possibility that concurrent DELETE
occurs on the same rows from separate threads. However, per documentation MySQL uses so-called 'next-key' lock for DELETE
statements, which locks both matching key and some gap. This thing leads to dead-locks and the only solution that we've found is to use READ COMMITTED
isolation level.
The Problem
Problem arises when executing complex DELETE
statements with JOIN
s of huge tables. In a particular case we have an table with warnings that has only two rows, but the query needs to drop all warnings that belong to some particular entities from two separate INNER JOIN
ed tables. The query is as follows:
DELETE pw
FROM proc_warnings pw
INNER JOIN day_position dp
ON dp.transaction_id = pw.transaction_id
INNER JOIN ivehicle_days vd
ON vd.id = dp.ivehicle_day_id
WHERE vd.ivehicle_id=? AND dp.dirty_data=1
When the day_position table is large enough (in my test case there are 1448 rows) then any transaction even with READ COMMITTED
isolation mode blocks entire proc_warnings
table.
The issue is always reproduced on this sample data – http://yadi.sk/d/QDuwBtpW1BxB9 both in MySQL 5.1 (checked on 5.1.59) and MySQL 5.5 (checked on MySQL 5.5.24).
EDIT: The linked sample data also contains schema and indexes for the query tables, reproduced here for convenience:
CREATE TABLE `proc_warnings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transaction_id` int(10) unsigned NOT NULL,
`warning` varchar(2048) NOT NULL,
PRIMARY KEY (`id`),
KEY `proc_warnings__transaction` (`transaction_id`)
);
CREATE TABLE `day_position` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`transaction_id` int(10) unsigned DEFAULT NULL,
`sort_index` int(11) DEFAULT NULL,
`ivehicle_day_id` int(10) unsigned DEFAULT NULL,
`dirty_data` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `day_position__trans` (`transaction_id`),
KEY `day_position__is` (`ivehicle_day_id`,`sort_index`),
KEY `day_position__id` (`ivehicle_day_id`,`dirty_data`)
) ;
CREATE TABLE `ivehicle_days` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`d` date DEFAULT NULL,
`sort_index` int(11) DEFAULT NULL,
`ivehicle_id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ivehicle_days__is` (`ivehicle_id`,`sort_index`),
KEY `ivehicle_days__d` (`d`)
);
Queries per transactions are as follows:
-
Transaction 1
set transaction isolation level read committed; set autocommit=0; begin; DELETE pw FROM proc_warnings pw INNER JOIN day_position dp ON dp.transaction_id = pw.transaction_id INNER JOIN ivehicle_days vd ON vd.id = dp.ivehicle_day_id WHERE vd.ivehicle_id=2 AND dp.dirty_data=1;
-
Transaction 2
set transaction isolation level read committed; set autocommit=0; begin; DELETE pw FROM proc_warnings pw INNER JOIN day_position dp ON dp.transaction_id = pw.transaction_id INNER JOIN ivehicle_days vd ON vd.id = dp.ivehicle_day_id WHERE vd.ivehicle_id=13 AND dp.dirty_data=1;
One of them always fails with 'Lock wait timeout exceeded…' error. The information_schema.innodb_trx
contains following rows:
| trx_id | trx_state | trx_started | trx_requested_lock_id | trx_wait_started | trx_wait | trx_mysql_thread_id | trx_query |
| '1A2973A4' | 'LOCK WAIT' | '2012-12-12 20:03:25' | '1A2973A4:0:3172298:2' | '2012-12-12 20:03:25' | '2' | '3089' | 'DELETE pw FROM proc_warnings pw INNER JOIN day_position dp ON dp.transaction_id = pw.transaction_id INNER JOIN ivehicle_days vd ON vd.id = dp.ivehicle_day_id WHERE vd.ivehicle_id=13 AND dp.dirty_data=1' |
| '1A296F67' | 'RUNNING' | '2012-12-12 19:58:02' | NULL | NULL | '7' | '3087' | NULL |
information_schema.innodb_locks
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
| '1A2973A4:0:3172298:2' | '1A2973A4' | 'X' | 'RECORD' | '`deadlock_test`.`proc_warnings`' | '`PRIMARY`' | '0' | '3172298' | '2' | '53' |
| '1A296F67:0:3172298:2' | '1A296F67' | 'X' | 'RECORD' | '`deadlock_test`.`proc_warnings`' | '`PRIMARY`' | '0' | '3172298' | '2' | '53' |
As I can see both queries wants an exclusive X
lock on a row with primary key = 53. However, neither of them must delete rows from proc_warnings
table. I just don't understand why the index is locked. Moreover, the index is not locked either when proc_warnings
table is empty or the day_position
table contains fewer number of rows (i.e. one hundred rows).
Further investigation was to run EXPLAIN
over the similar SELECT
query. It shows that query optimizer doesn't use index to query proc_warnings
table and that's the only reason I can imagine why it blocks the entire primary key index.
Simplified case
Issue also can be reproduced in a simpler case when there are only two tables with couple of records, but the child table doesn't has an index on the parent table ref column.
Create parent
table
CREATE TABLE `parent` (
`id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
Create child
table
CREATE TABLE `child` (
`id` int(10) unsigned NOT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
Fill tables
INSERT INTO `parent` (id) VALUES (1), (2);
INSERT INTO `child` (id, parent_id) VALUES (1, NULL), (2, NULL);
Test in two parallel transactions:
-
Transaction 1
SET TRANSACTION ISOLATION LEVEL READ COMMITTED; SET AUTOCOMMIT=0; BEGIN; DELETE c FROM child c INNER JOIN parent p ON p.id = c.parent_id WHERE p.id = 1;
-
Transaction 2
SET TRANSACTION ISOLATION LEVEL READ COMMITTED; SET AUTOCOMMIT=0; BEGIN; DELETE c FROM child c INNER JOIN parent p ON p.id = c.parent_id WHERE p.id = 2;
The common part in both cases is that MySQL doesn't use indices. I believe that's the reason of lock of entire table.
Our Solution
The only solution that we can see for now is increase the default lock wait timeout from 50 seconds to 500 seconds to let the thread finish cleaning up. Then keep fingers crossed.
Any help appreciated.
Best Answer
NEW ANSWER (MySQL-style dynamic SQL): Ok, this one tackles the problem in the way one of the other poster's described - reversing the order in which mutually incompatible exclusive locks are acquired so that regardless of how many occur, they occur only for the least amount of time at the end of transaction execution.
This is accomplished by separating the read part of the statement into it's own select statement and dynamically generating a delete statement that will be forced to run last due to order of statement appearance, and which will affect only the proc_warnings table.
A demo is available at sql fiddle:
This link shows the schema w/ sample data, and a simple query for rows that match on
ivehicle_id=2
. 2 rows result, as none of them have been deleted.This link shows the same schema, sample data, but pass a value 2 to the DeleteEntries stored program, telling the SP to delete
proc_warnings
entries forivehicle_id=2
. The simple query for rows returns no results as they've all been successfully deleted. The demo links only demostrate that the code works as intended to delete. The user with the proper test environment can comment on whether this solves the problem of the blocked thread.Here is the code as well for convenience:
This is the syntax to call the program from within a transaction:
ORIGINAL ANSWER (still think it's not too shabby) Looks like 2 issues: 1) slow query 2) unexpected locking behavior
As regards issue #1, slow queries are often resolved by the same two techniques in tandem query statement simplification and useful additions of or modifications to indexes. You yourself already made the connection to indexes - without them the optimizer cannot search for a limited set of rows to process, and each row from each table multiplying per extra row scanned the amount of extra work which must be done.
REVISED AFTER SEEING POST OF SCHEMA AND INDEXES: But I imagine you'll get the most performance benefit for your query by making sure you have a good index configuration. To do so, you can go for better delete performance, and possibly even better delete performance, with trade off of larger indexes and perhaps noticeably slower insert performance on the same tables to which additional index structure is added.
SOMEWHAT BETTER:
REVISED HERE TOO: Since it takes as long as it does to run, I'd leave the dirty_data in the index, and I got it wrong too for sure when I placed it after the ivehicle_day_id in index order - it should be first.
But if I had my hands on it, at this point, since there must be a good amount of data to make it take that long, I'd would just go for all covering indexes just to make sure I was getting the best indexing that my troubleshooting time could buy, if nothing else to rule that part of the problem out.
BEST/COVERING INDEXES:
There are two performance optimization goals sought by the last two change suggestions:
1) If the search keys for successively accessed tables are not the same as the clustered key results returned for the currently accessed table, we eliminate what would have been a need to make a second set of index-seek-with-scan operations on the clustered index
2) If the latter is not the case, there is still at least the possibility that the optimizer can select a more efficient join algorithm since the indexes will be keeping the required join keys in sorted order.
Your query seems about as simplified as it can be (copied here in case it is edited later):
Unless of course there's something about written join order that affects the way the query optimizer proceeds in which case you could try some of the rewrite suggestions others have provided, including perhaps this one w/ index hints (optional):
As regards #2, unexpected locking behavior.
I guess it would be the index that's locked because the row of data to be locked is in a clustered index, i.e. the single row of data itself resides in the index.
It would be locked, because:
1) according to http://dev.mysql.com/doc/refman/5.1/en/innodb-locks-set.html
You also mentioned above:
and provided the following reference for that:
http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html#isolevel_read-committed
Which states the same as you, except that according to that same reference there is a condition upon which a lock shall be released:
Which is reiterated as well at this manual page http://dev.mysql.com/doc/refman/5.1/en/innodb-record-level-locks.html
So, we're told that the WHERE condition must be evaluated before the lock can be relased. Unfortunately we're not told when the WHERE condition is evaluated and it would probably something subject to change from one plan to another created by the optimizer. But it does tell us that lock release, is dependent somehow on performance of query execution, optimization of which as we discuss above is dependent on careful writing of the statement, and judicious use of indexes. It can also be improved by better table design but that would probably be left best to a separate question.
The database can't lock records within the index if there are none.
This could mean numerous things such as but probably not limited to: a different execution plan due to a change in statistics, a too-brief-to-be-observed-lock due to a much faster execution due to a much smaller data set/join operation.