I'm running concurrent Postgres queries like this:
UPDATE foo SET bar = bar + 1 WHERE baz = 1234
Each query affects the fixed K number of rows, and I can't find a way to enforce the order in which the rows are updated, I end up with deadlocks. Currently I fix the problem by enforcing the order by hand, but this means I have to execute many more queries than I normally would while also raising the search complexity from O(log N + K) to O(K log N).
Is there a way to improve performance without ending up vulnerable to deadlocks? I suspect that replacing the (baz)
index with the (baz, id)
index might work provided that Postgres updates the rows in the same order that it have scanned them, is this an approach worth pursuing?
Best Answer
There is no
ORDER BY
in anSQL UPDATE
command. Postgres updates rows in arbitrary order:To avoid deadlocks with absolute certainty, you could run your statements in serializable transaction isolation. But that's more expensive and you need to prepare to repeat commands on serialization failure.
Your best course of action is probably to lock explicitly with
SELECT ... ORDER BY ... FOR UPDATE
in a subquery or a standaloneSELECT
in a transaction - in default "read committed" isolation level. Quoting Tom Lane on pgsql-general:This should do the job:
A multicolumn index on
(baz, bar)
might be perfect for performance. But sincebar
is obviously updated a lot, a single-column index on just(baz)
might be even better. Depends on a couple of factors. How many rows perbaz
? Are HOT updates possible without the multicolumn index? ...If
baz
is updated concurrently, there is still an unlikely corner case chance for conflicts (per documentation):Also, if you should have a unique constraint involving
bar
, consider aDEFERRABLE
constraint to avoid unique violations within the same command. Related answer: