Mat and Erwin are both right, and I'm only adding another answer to further expand on what they said in a way which won't fit in a comment. Since their answers don't seem to satisfy everyone, and there was a suggestion that PostgreSQL developers should be consulted, and I am one, I will elaborate.
The important point here is that under the SQL standard, within a transaction running at the READ COMMITTED
transaction isolation level, the restriction is that the work of uncommitted transactions must not be visible. When the work of committed transactions becomes visible is implementation-dependent. What you are pointing out is a difference in how two products have chosen to implement that. Neither implementation is violating the requirements of the standard.
Here's what happens within PostgreSQL, in detail:
S1-1 runs (1 row deleted)
The old row is left in place, because S1 might still roll back, but S1 now holds a lock on the row so that any other session attempting to modify the row will wait to see whether S1 commits or rolls back. Any reads of the table can still see the old row, unless they attempt to lock it with SELECT FOR UPDATE
or SELECT FOR SHARE
.
S2-1 runs (but is blocked since S1 has a write lock)
S2 now has to wait to see the outcome of S1. If S1 were to roll back rather than commit, S2 would delete the row. Note that if S1 inserted a new version before rolling back, the new version would never have been there from the perspective of any other transaction, nor would the old version have been deleted from the perspective of any other transaction.
S1-2 runs (1 row inserted)
This row is independent of the old one. If there had been an update of the row with id = 1, the old and new versions would be related, and S2 could delete the updated version of the row when it became unblocked. That a new row happens to have the same values as some row that existed in the past doesn't make it the same as an updated version of that row.
S1-3 runs, releasing the write lock
So S1's changes are persisted. One row is gone. One row has been added.
S2-1 runs, now that it can get the lock. But reports 0 rows deleted. HUH???
What happens internally, is that there is a pointer from one version of a row to the next version of that same row if it is updated. If the row is deleted, there is no next version. When a READ COMMITTED
transaction awakens from a block on a write conflict, it follows that update chain to the end; if the row has not been deleted and if it still meets the selection criteria of the query it will be processed. This row has been deleted, so S2's query moves on.
S2 may or may not get to the new row during its scan of the table. If it does, it will see that the new row was created after S2's DELETE
statement started, and so is not part of the set of rows visible to it.
If PostgreSQL were to restart S2's entire DELETE statement from the beginning with a new snapshot, it would behave the same as SQL Server. The PostgreSQL community has not chosen to do that for performance reasons. In this simple case you would never notice the difference in performance, but if you were ten million rows into a DELETE
when you got blocked, you certainly would. There's trade-off here where PostgreSQL has chosen performance, since the faster version still complies with the requirements of the standard.
S2-2 runs, reports a unique key constraint violation
Of course, the row already exists. This is the least surprising part of the picture.
While there is some surprising behavior here, everything is in conformance with the SQL standard and within bounds of what is "implementation-specific" according to the standard. It certainly can be surprising if you are assuming that some other implementation's behavior will be present in all implementations, but PostgreSQL tries very hard to avoid serialization failures in the READ COMMITTED
isolation level, and allows some behaviors that differ from other products in order to achieve that.
Now, personally I'm not a big fan of the READ COMMITTED
transaction isolation level in any product's implementation. They all allow race conditions to create surprising behaviors from a transactional point of view. Once someone becomes accustomed to the weird behaviors allowed by one product, they tend to consider that "normal" and the trade-offs chosen by another product odd. But every product has to make some sort of trade-off for any mode not actually implemented as SERIALIZABLE
. Where PostgreSQL developers have chosen to draw the line in READ COMMITTED
is to minimizing blocking (reads don't block writes and writes don't block reads) and to minimize the chance of serialization failures.
The standard requires that SERIALIZABLE
transactions be the default, but most products don't do that because it causes a performance hit over the more lax transaction isolation levels. Some products don't even provide truly serializable transactions when SERIALIZABLE
is chosen -- most notably Oracle and versions of PostgreSQL prior to 9.1. But using truly SERIALIZABLE
transactions is the only way to avoid surprising effects from race conditions, and SERIALIZABLE
transactions always must either block to avoid the race conditions or roll back some transactions to avoid a developing race condition. The most common implementation of SERIALIZABLE
transactions is Strict Two-Phase Locking (S2PL) which has both blocking and serialization failures (in the form of deadlocks).
Full disclosure: I worked with Dan Ports of MIT to add truly serializable transactions to PostgreSQL version 9.1 using a new technique called Serializable Snapshot Isolation.
If I understand the request correctly, the goal is to delete batches of rows, while at the same time, DML operations are occurring on rows throughout the table. The goal is to delete a batch; however, if any underlying rows contained within the range defined by said batch are locked, then we must skip that batch and move to the next batch. We must then return to any batches that were not previously deleted and retry our original delete logic. We must repeat this cycle until all required batches of rows are deleted.
As has been mentioned, it is reasonable to use a READPAST hint and the READ COMMITTED (default) isolation level, in order to skip past ranges that may contain blocked rows. I will go a step further and recommend using the SERIALIZABLE isolation level and nibbling deletes.
SQL Server uses Key-Range locks to protect a range of rows implicitly included in a record set being read by a Transact-SQL statement while using the serializable transaction isolation level...find more here:
https://technet.microsoft.com/en-US/library/ms191272(v=SQL.105).aspx
With nibbling deletes, our goal is to isolate a range of rows and ensure that no changes will occur to those rows while we are deleting them, that is to say, we do not want phantom reads or insertions. The serializable isolation level is meant to solve this problem.
Before I demonstrate my solution, I would like to add that neither am I recommending switching your database's default isolation level to SERIALIZABLE nor am I recommending that my solution is the best. I merely wish to present it and see where we can go from here.
A few house-keeping notes:
- The SQL Server version that I am using is Microsoft SQL Server 2012 - 11.0.5343.0 (X64)
- My test database is using the FULL recovery model
To begin my experiment, I will set up a test database, a sample table, and I will fill the table with 2,000,000 rows.
USE [master];
GO
SET NOCOUNT ON;
IF DATABASEPROPERTYEX (N'test', N'Version') > 0
BEGIN
ALTER DATABASE [test] SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;
DROP DATABASE [test];
END
GO
-- Create the test database
CREATE DATABASE [test];
GO
-- Set the recovery model to FULL
ALTER DATABASE [test] SET RECOVERY FULL;
-- Create a FULL database backup
-- in order to ensure we are in fact using
-- the FULL recovery model
-- I pipe it to dev null for simplicity
BACKUP DATABASE [test]
TO DISK = N'nul';
GO
USE [test];
GO
-- Create our table
IF OBJECT_ID('dbo.tbl','U') IS NOT NULL
BEGIN
DROP TABLE dbo.tbl;
END;
CREATE TABLE dbo.tbl
(
c1 BIGINT IDENTITY (1,1) NOT NULL
, c2 INT NOT NULL
) ON [PRIMARY];
GO
-- Insert 2,000,000 rows
INSERT INTO dbo.tbl
SELECT TOP 2000
number
FROM
master..spt_values
ORDER BY
number
GO 1000
At this point, we will need one or more indexes upon which the locking mechanisms of the SERIALIZABLE isolation level can act.
-- Add a clustered index
CREATE UNIQUE CLUSTERED INDEX CIX_tbl_c1
ON dbo.tbl (c1);
GO
-- Add a non-clustered index
CREATE NONCLUSTERED INDEX IX_tbl_c2
ON dbo.tbl (c2);
GO
Now, let us check to see that our 2,000,000 rows were created
SELECT
COUNT(*)
FROM
tbl;
So, we have our database, table, indexes, and rows. So, let us set up the experiment for nibbling deletes. First, we must decide how best to create a typical nibbling delete mechanism.
DECLARE
@BatchSize INT = 100
, @LowestValue BIGINT = 20000
, @HighestValue BIGINT = 20010
, @DeletedRowsCount BIGINT = 0
, @RowCount BIGINT = 1;
SET NOCOUNT ON;
GO
WHILE @DeletedRowsCount < ( @HighestValue - @LowestValue )
BEGIN
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
DELETE
FROM
dbo.tbl
WHERE
c1 IN (
SELECT TOP (@BatchSize)
c1
FROM
dbo.tbl
WHERE
c1 BETWEEN @LowestValue AND @HighestValue
ORDER BY
c1
);
SET @RowCount = ROWCOUNT_BIG();
COMMIT TRANSACTION;
SET @DeletedRowsCount += @RowCount;
WAITFOR DELAY '000:00:00.025';
CHECKPOINT;
END;
As you can see, I placed the explicit transaction inside the while loop. If you would like to limit log flushes, then feel free to place it outside the loop. Furthermore, since we are in the FULL recovery model, you may wish to create transaction log backups more often while running your nibbling delete operations, in order to ensure that your transaction log can be prevented from growing outrageously.
So, I have a couple goals with this setup. First, I want my key-range locks; so, I try to keep the batches as small as possible. I also do not want to impact negatively the concurrency on my "gigantic" table; so, I want to take my locks and leave them as fast as I can. So, I recommend that you make your batch sizes small.
Now, I want to provide a very short example of this deletion routine in action. We must open a new window within SSMS and delete one row from our table. I will do this within an implicit transaction using the default READ COMMITTED isolation level.
DELETE FROM
dbo.tbl
WHERE
c1 = 20005;
Was this row actually deleted?
SELECT
c1
FROM
dbo.tbl
WHERE
c1 BETWEEN 20000 AND 20010;
Yes, it was deleted.
Now, in order to see our locks, let us open a new window within SSMS and add a code snippet or two. I am using Adam Mechanic's sp_whoisactive, which can be found here: sp_whoisactive
SELECT
DB_NAME(resource_database_id) AS DatabaseName
, resource_type
, request_mode
FROM
sys.dm_tran_locks
WHERE
DB_NAME(resource_database_id) = 'test'
AND resource_type = 'KEY'
ORDER BY
request_mode;
-- Our insert
sp_lock 55;
-- Our deletions
sp_lock 52;
-- Our active sessions
sp_whoisactive;
Now, we are ready to begin. In a new SSMS window, let us begin an explicit transaction that will attempt to re-insert the one row that we deleted. At the same time, we will fire off our nibbling delete operation.
The insert code:
BEGIN TRANSACTION
SET IDENTITY_INSERT dbo.tbl ON;
INSERT INTO dbo.tbl
( c1 , c2 )
VALUES
( 20005 , 1 );
SET IDENTITY_INSERT dbo.tbl OFF;
--COMMIT TRANSACTION;
Let us kick off both operations beginning with the insert and followed by our deletes. We can see the key-range locks and exclusive locks.
The insert generated these locks:
The nibbling delete/select is holding these locks:
Our insert is blocking our delete as expected:
Now, let us commit the insert transaction and see what is up.
And as expected, all transactions complete. Now, we must check to see whether the insert was a phantom or whether the delete operation removed it as well.
SELECT
c1
FROM
dbo.tbl
WHERE
c1 BETWEEN 20000 AND 20015;
In fact, the insert was deleted; so, no phantom insert was allowed.
So, in conclusion, I think the true intention of this exercise is not to try and track every single row, page, or table-level lock and try to determine whether an element of a batch is locked and would therefore require our delete operation to wait. That may have been the intent of the questioners; however, that task is herculean and basically impractical if not impossible. The real goal is to ensure that no unwanted phenomena arise once we have isolated the range of our batch with locks of our own and then precede to delete the batch. The SERIALIZABLE isolation level achieves this objective. The key is to keep your nibbles small, your transaction log under control, and eliminate unwanted phenomena.
If you want speed, then don't build gigantically deep tables that cannot be partitioned and therefore be unable to use partition switching for the fastest results. The key to speed is partitioning and parallelism; the key to suffering is nibbles and live-locking.
Please let me know what you think.
I created some further examples of the SERIALIZABLE isolation level in action. They should be available at the links below.
Delete Operation
Insert Operation
Equality Operations - Key-Range Locks on Next Key Values
Equality Operations - Singleton Fetch of Existent Data
Equality Operations - Singleton Fetch of Nonexistent Data
Inequality Operations - Key-Range Locks on Range and Next Key Values
Best Answer
This is more a business process / UI race. It is often solved by deprecating / hiding values for a period before deleting and/or not accepting them as valid anymore.
One approach would be to add a
DateDeprecated
column to toys. Update it to the current date when you check and nobody has this toy as a favorite. Clear the field if/when the toy is selected as a favorite. In the UI don't show/list deprecated toys at all or don't show toys deprecated for a certain period of time like over 7 days. At some point delete long deprecated toys, say after they have been deprecated for 14 days.