I didn't get a chance to rerun the trace and capture the more nicely formatted XDL file, but in the meantime I was able to resolve the issue by removing all the isolation level and locking hints from the trigger and modifying the application code to run the INSERT itself from within a serializable transaction.
The app uses LINQ to SQL, so the INSERT statement is dynamically generated and run under the default READ COMMITTED isolation level. From the deadlock trace, I could tell that the locks taken by the INSERT were conflicting with the locks taken by the trigger. The deadlock was happening even after removing all the isolation and locking hints from the trigger, so I surmised that the trigger's UPDATE statements were causing stronger locks than the INSERT statement. And when the competing simultaneous transaction tried to do the same thing, the deadlock occurred.
We were definitely on the right track with our previous attempted solutions: using higher isolation levels and stronger locking in the trigger. The problem is that the weaker locks were already taken by the INSERT statement generated by the LINQ to SQL, before the trigger ever got called. The solution is to explicitly start a transaction in our .NET code using SERALIZABLE isolation level, do the insert in that transaction, let the trigger happen in the same transaction (without altering the isolation level or lock hints) and then commit and dispose the transaction. When I did that, I was no longer able to reproduce the issue (I even put it back to the old way and verified that the issue returned.) Here is the new code:
' Do the insert in a serializable transaction to prevent simultaneous inserts from deadlocking
' due to the trigger, which reaches out to other rows.
Me.packageContext.Connection.Open()
Try
Using oTransaction = Me.packageContext.Connection.BeginTransaction(IsolationLevel.Serializable)
Me.packageContext.Transaction = oTransaction
Try
Me.packageContext.SubmitChanges()
oTransaction.Commit()
Catch ex As Exception
oTransaction.Rollback()
Throw
End Try
End Using
Finally
' We need to close the connection and null out the transaction to allow
' subsequent uses of the same DataContext to work properly outside this transaction.
Me.packageContext.Connection.Close()
Me.packageContext.Transaction = Nothing
End Try
Normally we avoid opening transactions from within client side code, because they can run long if the code does remote API calls, causing excessive blocking. But in this case we are isolating a very small bit of code, and we have robust error/rollback handling around it.
An alternate solution would be to implement stored procedures that do the INSERTs within explicit SERIALIZABLE transactions, and then alter the DataContext to call the stored procedure. However, that solution is less maintainable as it requires updating both the stored procedure and the DataContext definition every time the schema changes.
Finally, another potential alternate solution would be to examine the deadlock log in detail and find a way to adjust our indexes or keys to avoid the deadlock in the first place. However, that is more difficult and may not be possible with our design.
I think the solution above is the simplest and most understandable for future developers.
If you use EXPLAIN
you'll see the ordering. For a random plan I concocted with some throwaway data and similar query (ORDER BY ... LIMIT 1 FOR UPDATE
) I got the plan:
Limit (cost=33.09..33.10 rows=1 width=26)
-> LockRows (cost=33.09..39.88 rows=543 width=26)
-> Sort (cost=33.09..34.45 rows=543 width=26)
Sort Key: t_id
-> Seq Scan on m (cost=0.00..30.38 rows=543 width=26)
Filter: (b_id <= 33)
(6 rows)
Here, you can see the rows are potentially locked before the limit is applied.
Now, in practice, the first row that gets matched will generally get locked and returned, causing LockRows
on the later rows to get skipped ... but there's no guarantee of this. So you shouldn't rely on it.
In general, mixing row-locking and LIMIT
is a bad idea that should generally be avoided.
If you must do it, you'll want to do something like:
SELECT *
FROM my_table
WHERE id = (SELECT id FROM my_table
WHERE mywhereclause
ORDER BY ... LIMIT 1)
AND mywhereclause
FOR UPDATE;
where you force a target row to be found, and only then locked.
You should repeat the WHERE clause in the outer query to ensure that the row still matches the predicate after any lock-wait occurs. Otherwise what can happen is that the inner query finds the row, returns the ID, and you try to lock the row with that ID. PostgreSQL sees that someone else has a lock on that row, so it waits for that lock to be released. Then it re-checks the WHERE
clause on the outer query to make sure it still matches (in case the row was deleted, or was UPDATE
d). Because the inner query is an uncorrelated subquery it won't re-calculate it, though, so the inner WHERE
clause doesn't get re-run, and you can get rows that no longer match the inner WHERE
clause returned if you don't repeat it in the outer WHERE
clause.
Mixing row limits and row locking is hard. Building correct queuing systems that actually perform better than a single worker is even harder. Use an off the shelf queuing system and save yourself a lot of hassle.
(PostgreSQL 9.5 will have SKIP LOCKED
, which makes this way easier, but there won't be a stable release of it for over a year from now, so don't hold your breath.)
Best Answer
There is no deterministic
ORDER BY
in:So multiple transactions might try to obtain row locks in different (arbitrary) order and interlock.
Also, why would you lock the whole table
machines
for a single update inmagictools
? If it's indeed a singleUPDATE
onmagictools
per transaction, there is no need to lock rowsmachines
manually at all. Just drop the command. (If there are multiple, then your question is lacking.)If there is a good reason to lock many or all rows in
machines
, either do it in deterministic order, like:Or lock the whole table instead, that's also cheaper:
SHARE MODE
is enough for the FK case you have. There is even an example in the manual suggesting as much.Locking the table also disallows
INSERT
, of course, unlike the row level locks in your example.