Mysql – Minimize locking on UPDATE with multiple WHERE columns

deadlockinnodblockingMySQL

I have a problematic query that's causing a lock timeout:

UPDATE <some_table> SET col1=<some value> 
WHERE col1 IS NULL AND col2 > <some value> 
ORDER BY col2 
LIMIT 100

I have two issues here:

I have multiple servers committing the query simultaneously, they lock each other out and I get lock timeout or even deadlock sometimes. Optimally the servers should update mutually exclusive rows so locking shouldn't happen at all. Is there a way I can skip any locked row in the update?
If I can't avoid locking, and I already have an index for col1 and another one for col2, will Innodb lock all rows that satisfy any condition in the WHERE clause or only the ones that satisfy both conditions? If the answer is the former, can I add an index for the two columns together or do I also need to remove the indexes that I have (for each column separately)?

Best Answer

You will need to lock all the rows yourself before each UPDATE.

See the MySQL Documentation on SELECT ... LOCK FOR UPDATE. This performs an exclusive lock on all the rows you pass through. Then, you can follow up with the needed UPDATE against the table.

In your particular case, you would do this:

SELECT * FROM <some_table>
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 100;
UPDATE <some_table> SET row1=<some value> 
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 100

INDEXING

You should index the table fully to support all possible ways you will be querying the data. Notwithstanding, you must alternate between SELECT ... FOR UPDATE and UPDATE.
Since you have both row1 and row2 in the WHERE clause, you should have an index with both columns in it.
There is one warning: If these columns are indexed, expect some slowness because the column is being updated and the BTREE index pages are being updated per row. You should also expect rapid growth of the insert buffer section of ibdata1 (See InnoDB Map)

I have many posts on the subject of SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARED MODE.

Case Study I went through with @RedBlueThing
Aug 08, 2011 : Are InnoDB Deadlocks exclusive to INSERT/UPDATE/DELETE?
Oct 18, 2011 : Preventing mysql deadlocks in your php application that uses SELECT… LOCK IN SHARE MODE
Jan 02, 2012 : LOCK IN SHARE MODE
Nov 19, 2012 : How long can Deadlock info hold in innodb status page?
Dec 13, 2012 : MySQL InnoDB locks primary key on delete even in READ COMMITTED
Feb 03, 2013 : MySQL InnoDB locking on combined UPDATE-JOIN statements
Mar 12, 2013 : How I prevent deadlock occurrence in my application?

UPDATE 2013-03-17 19:21 EDT

Since you have 9 WebServers hitting the DB Server, try this

On WebServer1 run

SELECT * FROM <some_table>
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 0,100;
UPDATE <some_table> SET row1=<some value> 
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 0,100;

On WebServer2 run

SELECT * FROM <some_table>
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 100,100;
UPDATE <some_table> SET row1=<some value> 
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 100,100;

On WebServer3 run

SELECT * FROM <some_table>
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 200,100;
UPDATE <some_table> SET row1=<some value> 
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 200,100;

All the way to WebServer9, run

SELECT * FROM <some_table>
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 800,100;
UPDATE <some_table> SET row1=<some value> 
WHERE row1 IS NULL AND row2 > <some value> 
ORDER BY row2 LIMIT 800,100;

You will have to place some PHP header file that unique identifies which machine runs which version of the query.

Related Solutions

Sql-server – Deadlock when update different rows with non-clustered index

...why with clustered index, the deadlock is still there (though hit rate seems to be dropped)

The question isn't precisely clear (e.g. how many updates and to which id values are in each transaction) but one obvious deadlock scenario arises with multiple single-row updates within a single transaction, where there is an overlap of [id] values, and the ids are updated in a different [id] order:

[T1]: Update id 2; Update id 1;
[T2]: Update id 1; Update id 2;

Deadlock sequence: T1 (u2), T2 (u1), T1 (u1) wait, T2 (u2) wait.

This deadlock sequence might be avoided by updating strictly in id order within each transaction (acquiring locks in the same order on the same path).

When use the clustered index, there is an exclusive lock on the key as well as an exclusive lock on RID when do update, which is expected; while there are two exclusive lock on two different RID if non-clustered index is used, which confuses me.

With a unique clustered index on id, an exclusive lock is taken on the clustering key to protect writes to the in-row data. A separate RID exclusive lock is required to protect the write to the LOB text column, which is stored on a separate data page by default.

When the table is a heap with only a nonclustered index on id, two things happen. First, one RID exclusive lock relates to the heap in-row data, and the other is the lock on the LOB data as before. The second effect is that a more complex execution plan is required.

With a clustered index and a simple single-value equality predicate update, the query processor can apply an optimization that performs the update (read and write) in a single operator, using a single path:

Single-operator update

The row is located and updated in a single seek operation, requiring only exclusive locks (no update locks are needed). An example locking sequence using your sample table:

acquiring IX lock on OBJECT: 6:992930809:0 -- TABLE
acquiring IX lock on PAGE: 6:1:59104 -- INROW
acquiring X lock on KEY: 6:72057594233618432 (61a06abd401c) -- INROW
acquiring IX lock on PAGE: 6:1:59091 -- LOB
acquiring X lock on RID: 6:1:59091:1 -- LOB

releasing lock reference on PAGE: 6:1:59091 -- LOB
releasing lock reference on RID: 6:1:59091:1 -- LOB
releasing lock reference on KEY: 6:72057594233618432 (61a06abd401c) -- INROW
releasing lock reference on PAGE: 6:1:59104 -- INROW

With only a nonclustered index, the same optimization cannot be applied because we need to read from one b-tree structure and write another. The multi-path plan has separate read and write phases:

Multi-iterator update

This acquires update locks when reading, converting to exclusive locks if the row qualifies. Example lock sequence with the schema given:

acquiring IX lock on OBJECT: 6:992930809:0 -- TABLE
acquiring IU lock on PAGE: 6:1:59105 -- NC INDEX
acquiring U lock on KEY: 6:72057594233749504 (61a06abd401c) -- NC INDEX
acquiring IU lock on PAGE: 6:1:59104 -- HEAP
acquiring U lock on RID: 6:1:59104:1 -- HEAP
acquiring IX lock on PAGE: 6:1:59104 -- HEAP convert to X
acquiring X lock on RID: 6:1:59104:1 -- HEAP convert to X
acquiring IU lock on PAGE: 6:1:59091 -- LOB
acquiring U lock on RID: 6:1:59091:1 -- LOB

releasing lock reference on PAGE: 6:1:59091 
releasing lock reference on RID: 6:1:59091:1
releasing lock reference on RID: 6:1:59104:1
releasing lock reference on PAGE: 6:1:59104 
releasing lock on KEY: 6:72057594233749504 (61a06abd401c)
releasing lock on PAGE: 6:1:59105

Note the LOB data is read and written at the Table Update iterator. The more complex plan and multiple read and write paths increase the chances of a deadlock.

Finally, I can't help but notice the data types used in the table definition. You should not use the deprecated text data type for new work; the alternative, if you really need the ability to store up to 2GB of data in this column, is varchar(max). One important difference between text and varchar(max) is that text data is stored off-row by default, while varchar(max) stores in-row by default.

Use Unicode types only if you need that flexibility (e.g. it is hard to see why an IP address would need Unicode). Also, choose appropriate length limits for your attributes - 255 everywhere seems unlikely to be correct.

Additional reading:
Deadlock and livelock common patterns
Bart Duncan's deadlock troubleshooting series

Tracing locks can be done in a variety of ways. SQL Server Express with Advanced Services (2014 & 2012 SP1 onward only) contains the Profiler tool, which is a supported way to view the details of lock acquisition and release.

Sql-server – How does the table scan in an UPDATE WHERE predicated on an unindexed column work

Order of table scan is never guaranteed. Order in which rows are locked is not guaranteed either. In addition, SQL Server has lock escalation, so you can't really say what the engine decides to lock, row, page, or table itself.

Thus, deadlock may happen even if values of @id are different in concurrent sessions, but rows of interest happened to reside on the same pages. In SQL Server 2008 and higher you can add hint ROWLOCK which reduces chances of lock escalation (but again it's not guaranteed). If UPDATE can't acquire lock it waits (LOCK_TIMEOUT specifies how long it's gonna wait ).