Sql-server – FIFO queue table for multiple workers in SQL Server

lockingqueuesql server

I was attempting to answer the following stackoverflow question:

After posting a somewhat naive answer, I figured I'd put my money where my mouth was and actually test the scenario I was suggesting, to be sure I wasn't sending the OP off on a wild goose chase. Well, it's turned out to be much harder than I thought (no surprise there to anyone, I'm sure).

Here's what I've tried and thought about:

  • First I tried a TOP 1 UPDATE with an ORDER BY inside a derived table, using ROWLOCK, READPAST. This yielded deadlocks and also processed items out of order. It must be as close to FIFO as possible, barring errors that require attempting to process the same row more than once.

  • I then tried selecting the desired next QueueID into a variable, using various combinations of READPAST, UPDLOCK, HOLDLOCK, and ROWLOCK to exclusively preserve the row for update by that session. All of the variations I tried suffered from the same issues as before as well as, for certain combinations with READPAST, complaining:

    You can only specify the READPAST lock in the READ COMMITTED or REPEATABLE READ isolation levels.

    This was confusing because it was READ COMMITTED. I have run into this before and it is frustrating.

  • Since I started writing this question, Remus Rusani posted a new answer to the question. I read his linked article and see that he is using destructive reads, since he said in his answer that it "is not realistically possible to hold on to locks for the duration of the web calls." After reading what his article says regarding hot spots and pages requiring locking to do any update or delete, I fear that even if I were able to work out the correct locks to do what I'm looking for, it would not be scalable and could not handle massive concurrency.

Right now I'm not sure where to go. Is it true that maintaining locks while the row is processed cannot be achieved (even if it did not support high tps or massive concurrency)? What am I missing?

In the hopes that people smarter than me and people more experienced than me can help out, below is the test script I was using. It is switched back to the TOP 1 UPDATE method but I left the other method in, commented out, in case you want to explore that, too.

Paste each of these into a separate session, run session 1, then quickly all the others. In about 50 seconds the test will be over. Look at the Messages from each session to see what work it did (or how it failed). The first session will show a rowset with a snapshot taken once a second detailing the locks present and the being-processed queue items. It works sometimes, and other times doesn't work at all.

Session 1

/* Session 1: Setup and control - Run this session first, then immediately run all other sessions */
IF Object_ID('dbo.Queue', 'U') IS NULL
   CREATE TABLE dbo.Queue (
      QueueID int identity(1,1) NOT NULL,
      StatusID int NOT NULL,
      QueuedDate datetime CONSTRAINT DF_Queue_QueuedDate DEFAULT (GetDate()),
      CONSTRAINT PK_Queue PRIMARY KEY CLUSTERED (QueuedDate, QueueID)
   );

IF Object_ID('dbo.QueueHistory', 'U') IS NULL
   CREATE TABLE dbo.QueueHistory (
      HistoryDate datetime NOT NULL,
      QueueID int NOT NULL
   );

IF Object_ID('dbo.LockHistory', 'U') IS NULL
   CREATE TABLE dbo.LockHistory (
      HistoryDate datetime NOT NULL,
      ResourceType varchar(100),
      RequestMode varchar(100),
      RequestStatus varchar(100),
      ResourceDescription varchar(200),
      ResourceAssociatedEntityID varchar(200)
   );

IF Object_ID('dbo.StartTime', 'U') IS NULL
   CREATE TABLE dbo.StartTime (
      StartTime datetime NOT NULL
   );

SET NOCOUNT ON;

IF (SELECT Count(*) FROM dbo.Queue) < 10000 BEGIN
   TRUNCATE TABLE dbo.Queue;

   WITH A (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
   B (N) AS (SELECT 1 FROM A Z, A I, A P),
   C (N) AS (SELECT Row_Number() OVER (ORDER BY (SELECT 1)) FROM B O, B W)
   INSERT dbo.Queue (StatusID, QueuedDate)
   SELECT 1, DateAdd(millisecond, C.N * 3, GetDate() - '00:05:00')
   FROM C
   WHERE C.N <= 10000;
END;

TRUNCATE TABLE dbo.StartTime;
INSERT dbo.StartTime SELECT GetDate() + '00:00:15'; -- or however long it takes you to go run the other sessions
GO
TRUNCATE TABLE dbo.QueueHistory;
SET NOCOUNT ON;

DECLARE
   @Time varchar(8),
   @Now datetime;
SELECT @Time = Convert(varchar(8), StartTime, 114)
FROM dbo.StartTime;
WAITFOR TIME @Time;

DECLARE @i int,
@QueueID int;
SET @i = 1;
WHILE @i <= 33 BEGIN
   SET @Now  = GetDate();
   INSERT dbo.QueueHistory
   SELECT
      @Now,
      QueueID
   FROM
      dbo.Queue Q WITH (NOLOCK)
   WHERE
      Q.StatusID <> 1;

   INSERT dbo.LockHistory
   SELECT
      @Now,
      L.resource_type,
      L.request_mode,
      L.request_status,
      L.resource_description,
      L.resource_associated_entity_id
   FROM
      sys.dm_tran_current_transaction T
      INNER JOIN sys.dm_tran_locks L
         ON L.request_owner_id = T.transaction_id;
   WAITFOR DELAY '00:00:01';
   SET @i = @i + 1;
END;

WITH Cols AS (
   SELECT *, Row_Number() OVER (PARTITION BY HistoryDate ORDER BY QueueID) Col
   FROM dbo.QueueHistory
), P AS (
   SELECT *
   FROM
      Cols
      PIVOT (Max(QueueID) FOR Col IN ([1], [2], [3], [4], [5], [6], [7], [8])) P
)
SELECT L.*, P.[1], P.[2], P.[3], P.[4], P.[5], P.[6], P.[7], P.[8]
FROM
   dbo.LockHistory L
   FULL JOIN P
      ON L.HistoryDate = P.HistoryDate

/* Clean up afterward
DROP TABLE dbo.StartTime;
DROP TABLE dbo.LockHistory;
DROP TABLE dbo.QueueHistory;
DROP TABLE dbo.Queue;
*/

Session 2

/* Session 2: Simulate an application instance holding a row locked for a long period, and eventually abandoning it. */
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET NOCOUNT ON;
SET XACT_ABORT ON;

DECLARE
   @QueueID int,
   @Time varchar(8);
SELECT @Time = Convert(varchar(8), StartTime + '0:00:01', 114)
FROM dbo.StartTime;
WAITFOR TIME @Time;
BEGIN TRAN;

--SET @QueueID = (
--   SELECT TOP 1 QueueID
--   FROM dbo.Queue WITH (READPAST, UPDLOCK)
--   WHERE StatusID = 1 -- ready
--   ORDER BY QueuedDate, QueueID
--);

--UPDATE dbo.Queue
--SET StatusID = 2 -- in process
----OUTPUT Inserted.*
--WHERE QueueID = @QueueID;

SET @QueueID = NULL;
UPDATE Q
SET Q.StatusID = 1, @QueueID = Q.QueueID
FROM (
   SELECT TOP 1 *
   FROM dbo.Queue WITH (ROWLOCK, READPAST)
   WHERE StatusID = 1
   ORDER BY QueuedDate, QueueID
) Q

PRINT @QueueID;

WAITFOR DELAY '00:00:20'; -- Release it partway through the test

ROLLBACK TRAN; -- Simulate client disconnecting

Session 3

/* Session 3: Run a near-continuous series of "failed" queue processing. */
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET XACT_ABORT ON;
SET NOCOUNT ON;
DECLARE
   @QueueID int,
   @EndDate datetime,
   @NextDate datetime,
   @Time varchar(8);

SELECT
   @EndDate = StartTime + '0:00:33',
   @Time = Convert(varchar(8), StartTime, 114)
FROM dbo.StartTime;

WAITFOR TIME @Time;

WHILE GetDate() < @EndDate BEGIN
   BEGIN TRAN;

   --SET @QueueID = (
   --   SELECT TOP 1 QueueID
   --   FROM dbo.Queue WITH (READPAST, UPDLOCK)
   --   WHERE StatusID = 1 -- ready
   --   ORDER BY QueuedDate, QueueID
   --);

   --UPDATE dbo.Queue
   --SET StatusID = 2 -- in process
   ----OUTPUT Inserted.*
   --WHERE QueueID = @QueueID;

   SET @QueueID = NULL;
   UPDATE Q
   SET Q.StatusID = 1, @QueueID = Q.QueueID
   FROM (
      SELECT TOP 1 *
      FROM dbo.Queue WITH (ROWLOCK, READPAST)
      WHERE StatusID = 1
      ORDER BY QueuedDate, QueueID
   ) Q

   PRINT @QueueID;

   SET @NextDate = GetDate() + '00:00:00.015';
   WHILE GetDate() < @NextDate SET NOCOUNT ON;
   ROLLBACK TRAN;
END

Session 4 and up — as many as you like

/* Session 4: "Process" the queue normally, one every second for 30 seconds. */
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET XACT_ABORT ON;
SET NOCOUNT ON;

DECLARE @Time varchar(8);
SELECT @Time = Convert(varchar(8), StartTime, 114)
FROM dbo.StartTime;
WAITFOR TIME @Time;

DECLARE @i int,
@QueueID int;
SET @i = 1;
WHILE @i <= 30 BEGIN
   BEGIN TRAN;

   --SET @QueueID = (
   --   SELECT TOP 1 QueueID
   --   FROM dbo.Queue WITH (READPAST, UPDLOCK)
   --   WHERE StatusID = 1 -- ready
   --   ORDER BY QueuedDate, QueueID
   --);

   --UPDATE dbo.Queue
   --SET StatusID = 2 -- in process
   --WHERE QueueID = @QueueID;

   SET @QueueID = NULL;
   UPDATE Q
   SET Q.StatusID = 1, @QueueID = Q.QueueID
   FROM (
      SELECT TOP 1 *
      FROM dbo.Queue WITH (ROWLOCK, READPAST)
      WHERE StatusID = 1
      ORDER BY QueuedDate, QueueID
   ) Q

   PRINT @QueueID;
   WAITFOR DELAY '00:00:01'
   SET @i = @i + 1;
   DELETE dbo.Queue
   WHERE QueueID = @QueueID;   
   COMMIT TRAN;
END

Best Answer

You need exactly 3 lock hints

  • READPAST
  • UPDLOCK
  • ROWLOCK

I answered this previously on SO: https://stackoverflow.com/questions/939831/sql-server-process-queue-race-condition/940001#940001

As Remus says, using service broker is nicer but these hints do work

Your error about isolation level usually means replication or NOLOCK is involved.