PostgreSQL Deadlock – Resolving Deadlocks with Many Updates and SELECT FOR UPDATE

deadlockdeadlock-graphpostgresqlupdate

I have two tables: "machines" and "magictools". "magictools" refers to "machines" with a foreign key.

I face a deadlock problem while executing many of these requests:

//this will produce an "AccessExclusiveLock" of type "tuple" on machines
SELECT FROM * machines FOR UPDATE where id = :id;
//this will produce a "RowExclusiveLock" on magictools and a "RowShareLock" on machines
UPDATE magictools SET collections = "large-json" where id = :id

As far I understood running many of these request produces deadlocks. Maybe it's only the updates who are performing this, I don't know.
How should I avoid the deadlocks in this case?

I have many indexes on these tables, maybe I have too many indexes?

Below is the report of pg_activity when the problem happened. I do not understand the different modes and lock types, and, just, what happens here?
Is it possible that just an update without any transaction causes a deadlock?

deadlock trace link

Best Answer

There is no deterministic ORDER BY in:

SELECT FROM * machines FOR UPDATE;

So multiple transactions might try to obtain row locks in different (arbitrary) order and interlock.

Also, why would you lock the whole table machines for a single update in magictools? If it's indeed a single UPDATE on magictools per transaction, there is no need to lock rows machines manually at all. Just drop the command. (If there are multiple, then your question is lacking.)

If there is a good reason to lock many or all rows in machines, either do it in deterministic order, like:

SELECT FROM * machines ORDER BY id FOR UPDATE;

Or lock the whole table instead, that's also cheaper:

LOCK TABLE machines IN SHARE MODE;

SHARE MODE is enough for the FK case you have. There is even an example in the manual suggesting as much.

Locking the table also disallows INSERT, of course, unlike the row level locks in your example.

Related Solutions

Sql-server – Why are these two INSERTs deadlocking? Is it the trigger? What does this deadlock trace 1222 log tell me

I didn't get a chance to rerun the trace and capture the more nicely formatted XDL file, but in the meantime I was able to resolve the issue by removing all the isolation level and locking hints from the trigger and modifying the application code to run the INSERT itself from within a serializable transaction.

The app uses LINQ to SQL, so the INSERT statement is dynamically generated and run under the default READ COMMITTED isolation level. From the deadlock trace, I could tell that the locks taken by the INSERT were conflicting with the locks taken by the trigger. The deadlock was happening even after removing all the isolation and locking hints from the trigger, so I surmised that the trigger's UPDATE statements were causing stronger locks than the INSERT statement. And when the competing simultaneous transaction tried to do the same thing, the deadlock occurred.

We were definitely on the right track with our previous attempted solutions: using higher isolation levels and stronger locking in the trigger. The problem is that the weaker locks were already taken by the INSERT statement generated by the LINQ to SQL, before the trigger ever got called. The solution is to explicitly start a transaction in our .NET code using SERALIZABLE isolation level, do the insert in that transaction, let the trigger happen in the same transaction (without altering the isolation level or lock hints) and then commit and dispose the transaction. When I did that, I was no longer able to reproduce the issue (I even put it back to the old way and verified that the issue returned.) Here is the new code:

            ' Do the insert in a serializable transaction to prevent simultaneous inserts from deadlocking
            ' due to the trigger, which reaches out to other rows.
            Me.packageContext.Connection.Open()
            Try
                Using oTransaction = Me.packageContext.Connection.BeginTransaction(IsolationLevel.Serializable)
                    Me.packageContext.Transaction = oTransaction
                    Try
                        Me.packageContext.SubmitChanges()
                        oTransaction.Commit()
                    Catch ex As Exception
                        oTransaction.Rollback()
                        Throw
                    End Try
                End Using
            Finally
                ' We need to close the connection and null out the transaction to allow
                ' subsequent uses of the same DataContext to work properly outside this transaction.
                Me.packageContext.Connection.Close()
                Me.packageContext.Transaction = Nothing
            End Try

Normally we avoid opening transactions from within client side code, because they can run long if the code does remote API calls, causing excessive blocking. But in this case we are isolating a very small bit of code, and we have robust error/rollback handling around it.

An alternate solution would be to implement stored procedures that do the INSERTs within explicit SERIALIZABLE transactions, and then alter the DataContext to call the stored procedure. However, that solution is less maintainable as it requires updating both the stored procedure and the DataContext definition every time the schema changes.

Finally, another potential alternate solution would be to examine the deadlock log in detail and find a way to adjust our indexes or keys to avoid the deadlock in the first place. However, that is more difficult and may not be possible with our design.

I think the solution above is the simplest and most understandable for future developers.

How Many Rows Locked by SELECT ORDER BY LIMIT 1 FOR UPDATE in PostgreSQL?

If you use EXPLAIN you'll see the ordering. For a random plan I concocted with some throwaway data and similar query (ORDER BY ... LIMIT 1 FOR UPDATE) I got the plan:

 Limit  (cost=33.09..33.10 rows=1 width=26)
   ->  LockRows  (cost=33.09..39.88 rows=543 width=26)
         ->  Sort  (cost=33.09..34.45 rows=543 width=26)
               Sort Key: t_id
               ->  Seq Scan on m  (cost=0.00..30.38 rows=543 width=26)
                     Filter: (b_id <= 33)
(6 rows)

Here, you can see the rows are potentially locked before the limit is applied.

Now, in practice, the first row that gets matched will generally get locked and returned, causing LockRows on the later rows to get skipped ... but there's no guarantee of this. So you shouldn't rely on it.

In general, mixing row-locking and LIMIT is a bad idea that should generally be avoided.

If you must do it, you'll want to do something like:

SELECT *
FROM my_table
WHERE id = (SELECT id FROM my_table 
            WHERE mywhereclause 
            ORDER BY ... LIMIT 1)
  AND mywhereclause
FOR UPDATE;

where you force a target row to be found, and only then locked.

You should repeat the WHERE clause in the outer query to ensure that the row still matches the predicate after any lock-wait occurs. Otherwise what can happen is that the inner query finds the row, returns the ID, and you try to lock the row with that ID. PostgreSQL sees that someone else has a lock on that row, so it waits for that lock to be released. Then it re-checks the WHERE clause on the outer query to make sure it still matches (in case the row was deleted, or was UPDATEd). Because the inner query is an uncorrelated subquery it won't re-calculate it, though, so the inner WHERE clause doesn't get re-run, and you can get rows that no longer match the inner WHERE clause returned if you don't repeat it in the outer WHERE clause.

Mixing row limits and row locking is hard. Building correct queuing systems that actually perform better than a single worker is even harder. Use an off the shelf queuing system and save yourself a lot of hassle.

(PostgreSQL 9.5 will have SKIP LOCKED, which makes this way easier, but there won't be a stable release of it for over a year from now, so don't hold your breath.)

Best Answer

Related Solutions

Sql-server – Why are these two INSERTs deadlocking? Is it the trigger? What does this deadlock trace 1222 log tell me

How Many Rows Locked by SELECT ORDER BY LIMIT 1 FOR UPDATE in PostgreSQL?

Related Question