Postgresql – Locking issue with concurrent DELETE / INSERT in PostgreSQL

concurrencylockingpostgresqlserialization

This is pretty simple, but I'm baffled by what PG does (v9.0).
We start with a simple table:

CREATE TABLE test (id INT PRIMARY KEY);

and a few rows:

INSERT INTO TEST VALUES (1);
INSERT INTO TEST VALUES (2);

Using my favorite JDBC query tool (ExecuteQuery), I connect two session windows to the db where this table lives. Both of them are transactional (ie, auto-commit=false). Let's call them S1 and S2.

The same bit of code for each:

1:DELETE FROM test WHERE id=1;
2:INSERT INTO test VALUES (1);
3:COMMIT;

Now, run this in slow motion, executing one at a time in the windows.

S1-1 runs (1 row deleted)
S2-1 runs (but is blocked since S1 has a write lock)
S1-2 runs (1 row inserted)
S1-3 runs, releasing the write lock
S2-1 runs, now that it can get the lock. But reports 0 rows deleted. HUH???
S2-2 runs, reports a unique key constraint violation

Now, this works fine in SQLServer. When S2 does the delete, it reports 1 row deleted. And then S2's insert works fine.

I suspect that PostgreSQL is locking the index in the table where that row exists, whereas SQLServer locks the actual key value.

Am I right? Can this be made to work?

Best Answer

Mat and Erwin are both right, and I'm only adding another answer to further expand on what they said in a way which won't fit in a comment. Since their answers don't seem to satisfy everyone, and there was a suggestion that PostgreSQL developers should be consulted, and I am one, I will elaborate.

The important point here is that under the SQL standard, within a transaction running at the READ COMMITTED transaction isolation level, the restriction is that the work of uncommitted transactions must not be visible. When the work of committed transactions becomes visible is implementation-dependent. What you are pointing out is a difference in how two products have chosen to implement that. Neither implementation is violating the requirements of the standard.

Here's what happens within PostgreSQL, in detail:

S1-1 runs (1 row deleted)

The old row is left in place, because S1 might still roll back, but S1 now holds a lock on the row so that any other session attempting to modify the row will wait to see whether S1 commits or rolls back. Any reads of the table can still see the old row, unless they attempt to lock it with SELECT FOR UPDATE or SELECT FOR SHARE.

S2-1 runs (but is blocked since S1 has a write lock)

S2 now has to wait to see the outcome of S1. If S1 were to roll back rather than commit, S2 would delete the row. Note that if S1 inserted a new version before rolling back, the new version would never have been there from the perspective of any other transaction, nor would the old version have been deleted from the perspective of any other transaction.

S1-2 runs (1 row inserted)

This row is independent of the old one. If there had been an update of the row with id = 1, the old and new versions would be related, and S2 could delete the updated version of the row when it became unblocked. That a new row happens to have the same values as some row that existed in the past doesn't make it the same as an updated version of that row.

S1-3 runs, releasing the write lock

So S1's changes are persisted. One row is gone. One row has been added.

S2-1 runs, now that it can get the lock. But reports 0 rows deleted. HUH???

What happens internally, is that there is a pointer from one version of a row to the next version of that same row if it is updated. If the row is deleted, there is no next version. When a READ COMMITTED transaction awakens from a block on a write conflict, it follows that update chain to the end; if the row has not been deleted and if it still meets the selection criteria of the query it will be processed. This row has been deleted, so S2's query moves on.

S2 may or may not get to the new row during its scan of the table. If it does, it will see that the new row was created after S2's DELETE statement started, and so is not part of the set of rows visible to it.

If PostgreSQL were to restart S2's entire DELETE statement from the beginning with a new snapshot, it would behave the same as SQL Server. The PostgreSQL community has not chosen to do that for performance reasons. In this simple case you would never notice the difference in performance, but if you were ten million rows into a DELETE when you got blocked, you certainly would. There's trade-off here where PostgreSQL has chosen performance, since the faster version still complies with the requirements of the standard.

S2-2 runs, reports a unique key constraint violation

Of course, the row already exists. This is the least surprising part of the picture.

While there is some surprising behavior here, everything is in conformance with the SQL standard and within bounds of what is "implementation-specific" according to the standard. It certainly can be surprising if you are assuming that some other implementation's behavior will be present in all implementations, but PostgreSQL tries very hard to avoid serialization failures in the READ COMMITTED isolation level, and allows some behaviors that differ from other products in order to achieve that.

Now, personally I'm not a big fan of the READ COMMITTED transaction isolation level in any product's implementation. They all allow race conditions to create surprising behaviors from a transactional point of view. Once someone becomes accustomed to the weird behaviors allowed by one product, they tend to consider that "normal" and the trade-offs chosen by another product odd. But every product has to make some sort of trade-off for any mode not actually implemented as SERIALIZABLE. Where PostgreSQL developers have chosen to draw the line in READ COMMITTED is to minimizing blocking (reads don't block writes and writes don't block reads) and to minimize the chance of serialization failures.

The standard requires that SERIALIZABLE transactions be the default, but most products don't do that because it causes a performance hit over the more lax transaction isolation levels. Some products don't even provide truly serializable transactions when SERIALIZABLE is chosen -- most notably Oracle and versions of PostgreSQL prior to 9.1. But using truly SERIALIZABLE transactions is the only way to avoid surprising effects from race conditions, and SERIALIZABLE transactions always must either block to avoid the race conditions or roll back some transactions to avoid a developing race condition. The most common implementation of SERIALIZABLE transactions is Strict Two-Phase Locking (S2PL) which has both blocking and serialization failures (in the form of deadlocks).

Full disclosure: I worked with Dan Ports of MIT to add truly serializable transactions to PostgreSQL version 9.1 using a new technique called Serializable Snapshot Isolation.