As was asked on stackoverflow but then for MySQL, I was wondering how this works in PostgreSQL. How many rows get locked when I do a "FOR UPDATE" in combination with a "LIMIT", a "ORDER BY" and some "WHERE"'s. And they update the returned row. I was hoping the "FOR UPDATE"-query would only lock ONE row but maybe I miss some implementations issue. (I'm planning to write some stress-test to reproduce this more reliably)
These queries cause a deadlock in my system:
(session1) select field1, field2, ... from documents where transaction_path='some/path' for update
(session2) select field1, field2, ... from documents where (transaction_path is null) and queue_id=2 and next_pickup_ts<now() order by next_pickup_ts limit 1 for update
(session3) select field1, field2, ... from documents where (transaction_path is null) and queue_id=2 and next_pickup_ts<now() order by next_pickup_ts limit 1 for update
But I do not understand why. (There is a unique index on transaction_path.)
Best Answer
If you use
EXPLAIN
you'll see the ordering. For a random plan I concocted with some throwaway data and similar query (ORDER BY ... LIMIT 1 FOR UPDATE
) I got the plan:Here, you can see the rows are potentially locked before the limit is applied.
Now, in practice, the first row that gets matched will generally get locked and returned, causing
LockRows
on the later rows to get skipped ... but there's no guarantee of this. So you shouldn't rely on it.In general, mixing row-locking and
LIMIT
is a bad idea that should generally be avoided.If you must do it, you'll want to do something like:
where you force a target row to be found, and only then locked.
You should repeat the WHERE clause in the outer query to ensure that the row still matches the predicate after any lock-wait occurs. Otherwise what can happen is that the inner query finds the row, returns the ID, and you try to lock the row with that ID. PostgreSQL sees that someone else has a lock on that row, so it waits for that lock to be released. Then it re-checks the
WHERE
clause on the outer query to make sure it still matches (in case the row was deleted, or wasUPDATE
d). Because the inner query is an uncorrelated subquery it won't re-calculate it, though, so the innerWHERE
clause doesn't get re-run, and you can get rows that no longer match the innerWHERE
clause returned if you don't repeat it in the outerWHERE
clause.Mixing row limits and row locking is hard. Building correct queuing systems that actually perform better than a single worker is even harder. Use an off the shelf queuing system and save yourself a lot of hassle.
(PostgreSQL 9.5 will have
SKIP LOCKED
, which makes this way easier, but there won't be a stable release of it for over a year from now, so don't hold your breath.)