Mat and Erwin are both right, and I'm only adding another answer to further expand on what they said in a way which won't fit in a comment. Since their answers don't seem to satisfy everyone, and there was a suggestion that PostgreSQL developers should be consulted, and I am one, I will elaborate.
The important point here is that under the SQL standard, within a transaction running at the READ COMMITTED
transaction isolation level, the restriction is that the work of uncommitted transactions must not be visible. When the work of committed transactions becomes visible is implementation-dependent. What you are pointing out is a difference in how two products have chosen to implement that. Neither implementation is violating the requirements of the standard.
Here's what happens within PostgreSQL, in detail:
S1-1 runs (1 row deleted)
The old row is left in place, because S1 might still roll back, but S1 now holds a lock on the row so that any other session attempting to modify the row will wait to see whether S1 commits or rolls back. Any reads of the table can still see the old row, unless they attempt to lock it with SELECT FOR UPDATE
or SELECT FOR SHARE
.
S2-1 runs (but is blocked since S1 has a write lock)
S2 now has to wait to see the outcome of S1. If S1 were to roll back rather than commit, S2 would delete the row. Note that if S1 inserted a new version before rolling back, the new version would never have been there from the perspective of any other transaction, nor would the old version have been deleted from the perspective of any other transaction.
S1-2 runs (1 row inserted)
This row is independent of the old one. If there had been an update of the row with id = 1, the old and new versions would be related, and S2 could delete the updated version of the row when it became unblocked. That a new row happens to have the same values as some row that existed in the past doesn't make it the same as an updated version of that row.
S1-3 runs, releasing the write lock
So S1's changes are persisted. One row is gone. One row has been added.
S2-1 runs, now that it can get the lock. But reports 0 rows deleted. HUH???
What happens internally, is that there is a pointer from one version of a row to the next version of that same row if it is updated. If the row is deleted, there is no next version. When a READ COMMITTED
transaction awakens from a block on a write conflict, it follows that update chain to the end; if the row has not been deleted and if it still meets the selection criteria of the query it will be processed. This row has been deleted, so S2's query moves on.
S2 may or may not get to the new row during its scan of the table. If it does, it will see that the new row was created after S2's DELETE
statement started, and so is not part of the set of rows visible to it.
If PostgreSQL were to restart S2's entire DELETE statement from the beginning with a new snapshot, it would behave the same as SQL Server. The PostgreSQL community has not chosen to do that for performance reasons. In this simple case you would never notice the difference in performance, but if you were ten million rows into a DELETE
when you got blocked, you certainly would. There's trade-off here where PostgreSQL has chosen performance, since the faster version still complies with the requirements of the standard.
S2-2 runs, reports a unique key constraint violation
Of course, the row already exists. This is the least surprising part of the picture.
While there is some surprising behavior here, everything is in conformance with the SQL standard and within bounds of what is "implementation-specific" according to the standard. It certainly can be surprising if you are assuming that some other implementation's behavior will be present in all implementations, but PostgreSQL tries very hard to avoid serialization failures in the READ COMMITTED
isolation level, and allows some behaviors that differ from other products in order to achieve that.
Now, personally I'm not a big fan of the READ COMMITTED
transaction isolation level in any product's implementation. They all allow race conditions to create surprising behaviors from a transactional point of view. Once someone becomes accustomed to the weird behaviors allowed by one product, they tend to consider that "normal" and the trade-offs chosen by another product odd. But every product has to make some sort of trade-off for any mode not actually implemented as SERIALIZABLE
. Where PostgreSQL developers have chosen to draw the line in READ COMMITTED
is to minimizing blocking (reads don't block writes and writes don't block reads) and to minimize the chance of serialization failures.
The standard requires that SERIALIZABLE
transactions be the default, but most products don't do that because it causes a performance hit over the more lax transaction isolation levels. Some products don't even provide truly serializable transactions when SERIALIZABLE
is chosen -- most notably Oracle and versions of PostgreSQL prior to 9.1. But using truly SERIALIZABLE
transactions is the only way to avoid surprising effects from race conditions, and SERIALIZABLE
transactions always must either block to avoid the race conditions or roll back some transactions to avoid a developing race condition. The most common implementation of SERIALIZABLE
transactions is Strict Two-Phase Locking (S2PL) which has both blocking and serialization failures (in the form of deadlocks).
Full disclosure: I worked with Dan Ports of MIT to add truly serializable transactions to PostgreSQL version 9.1 using a new technique called Serializable Snapshot Isolation.
Best Answer
Without a
UNIQUE
constraint standing in the way, your original query works just fine. See below.I would throw in a separate CTE to provide all input values once:
The
DELETE
cannot delete rows from theINSERT
in the same statement, as both see the same snapshot of underlying tables. Meaning, theDELETE
cannot see the rows entered by theINSERT
in the other CTE. Both are executed at the same time, virtually. Related:That's also the reason why this cannot work with a
UNIQUE
index on(a.b)
. Uniqueness is enforced at all times. TheINSERT
still sees rows being deleted in the other CTE. The obvious alternative would be an UPSERT. But that cannot be the case, since you mentioned that there is no PK and theDELETE
can delete 0-n rows.As a_horse commented: would work with a deferrable constraint, though. See:
But deferrable constraints are considerably more expensive and don't work with FK constraints, nor as arbiter in UPSERT statements ...
Note that a free-standing
VALUES
expression may require explicit type casts. See:Related:
That said, I don't see how this is superior to a
DELETE
and a separateINSERT
in a single transaction - which also works withUNIQUE
constraints. You commented:If the statement is used a lot, consider a function:
Then the call is simple:
If you only need it for some sessions, a temporary function might be an option. See:
Or a prepared statement with the above CTE:
Call:
Most languages have their own implementation of prepared statements using
libpq
...Prepared statement and function know their input types. Explicit type casts are not required (except special cases).
Related: