Optimizing Concurrent Updates in PostgreSQL

deadlocklockingpostgresqlupdate

I'm running concurrent Postgres queries like this:

UPDATE foo SET bar = bar + 1 WHERE baz = 1234

Each query affects the fixed K number of rows, and I can't find a way to enforce the order in which the rows are updated, I end up with deadlocks. Currently I fix the problem by enforcing the order by hand, but this means I have to execute many more queries than I normally would while also raising the search complexity from O(log N + K) to O(K log N).

Is there a way to improve performance without ending up vulnerable to deadlocks? I suspect that replacing the (baz) index with the (baz, id) index might work provided that Postgres updates the rows in the same order that it have scanned them, is this an approach worth pursuing?

Best Answer

There is no ORDER BY in an SQL UPDATE command. Postgres updates rows in arbitrary order:

To avoid deadlocks with absolute certainty, you could run your statements in serializable transaction isolation. But that's more expensive and you need to prepare to repeat commands on serialization failure.

Your best course of action is probably to lock explicitly with SELECT ... ORDER BY ... FOR UPDATE in a subquery or a standalone SELECT in a transaction - in default "read committed" isolation level. Quoting Tom Lane on pgsql-general:

Should be all right --- the FOR UPDATE locking is always the last step in the SELECT pipeline.

This should do the job:

BEGIN;

SELECT 1
FROM   foo 
WHERE  baz = 1234
ORDER  BY bar
FOR    UPDATE;

UPDATE foo
SET    bar = bar + 1
WHERE  baz = 1234;

COMMIT;

A multicolumn index on (baz, bar) might be perfect for performance. But since bar is obviously updated a lot, a single-column index on just (baz) might be even better. Depends on a couple of factors. How many rows per baz? Are HOT updates possible without the multicolumn index? ...

If baz is updated concurrently, there is still an unlikely corner case chance for conflicts (per documentation):

It is possible for a SELECT command running at the READ COMMITTED transaction isolation level and using ORDER BY and a locking clause to return rows out of order. ...

Also, if you should have a unique constraint involving bar, consider a DEFERRABLE constraint to avoid unique violations within the same command. Related answer: