SQL Server Locking – Strategies for Checking Out Records for Processing

lockingsql serversql-server-2008-r2

I'm not sure if there is a named pattern for this, or if there isn't because it's a terrible idea. But I need my service to operate in an active/active load balanced environment. This is the applicaiton server only. The database will be on a separate server. I have a service that will need to run through a process for each record in a table. This process can take a minute or two, and will repeat every n minutes (configurable, usually 15 minutes).

With a table of 1000 records that needs this processing, and two services running against this same data set, I would like to have each service "check out" a record to process. I need to make sure that only one service/thread is processing each record at a time.

I have colleagues that have used a "lock table" in the past. Where a record is written to this table to logically lock the record in the other table (that other table is pretty static btw, and with a very occassional new record added), and then deleted to release the lock.

I'm wondering if it wouldn't be better for the new table have a column that indicates when it was locked, and that it is currently locked, instead of inserting an deleting constantly.

Does anyone have an tips for this kind of thing? Is there an established pattern for long(ish) term logical locking? Any tips for how to ensure only one service grabs the lock at a time? (My colleague uses TABLOCKX to lock the entire table.)

Best Answer

I am not a huge fan of either the extra "lock" table or the idea of locking the whole table to grab the next record. I get why it is being done, but that also hurts concurrency for operations that are updating to release a locked record (surely two processes can't be fighting over that when it is not possible for two processes to have locked the same record at the same time).

My preference would be to add a ProcessStatusID (typically TINYINT) column to the table with the data being processed. And is there a field for LastModifiedDate? If not, then it should be added. If yes, then do these records get updated outside of this processing? If records can be updated outside of this particular process, then another field should be added to track StatusModifiedDate (or something like that). For the rest of this answer I will just use "StatusModifiedDate" as it is clear in its meaning (and in fact, could be used as the field name even if there is currently no "LastModifiedDate" field).

The values for ProcessStatusID (which should be placed into a new lookup table called "ProcessStatus" and Foreign Keyed to this table) could be:

  1. Completed (or even "Pending" in this case as both mean "ready to be processed")
  2. In Process (or "Processing")
  3. Error (or "WTF?")

At this point it seems safe to assume that from the application, it just wants to grab the next record to process and won't be passing anything in to help make that decision. So we want to grab the oldest (at least in terms of StatusModifiedDate) record that is set to "Completed" / "Pending". Something along the lines of:

SELECT TOP 1 pt.RecordID
FROM   ProcessTable pt
WHERE  pt.StatusID = 1
ORDER BY pt.StatusModifiedDate ASC;

We also want to update that record to "In Process" at the same time to prevent the other process from grabbing it. We could use the OUTPUT clause to let us do the UPDATE and SELECT in the same transaction:

UPDATE TOP (1) pt
SET    pt.StatusID = 2,
       pt.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
OUTPUT INSERTED.RecordID
FROM   ProcessTable pt
WHERE  pt.StatusID = 1;

The main problem here is that while we can do a TOP (1) in an UPDATE operation, there is no way to do an ORDER BY. But, we can wrap it in a CTE to combine those two concepts:

;WITH cte AS
(
   SELECT TOP 1 pt.RecordID
   FROM   ProcessTable pt (READPAST, ROWLOCK, UPDLOCK)
   WHERE  pt.StatusID = 1
   ORDER BY pt.StatusModifiedDate ASC;
)
UPDATE cte
SET    cte.StatusID = 2,
       cte.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
OUTPUT INSERTED.RecordID;

The obvious question is whether or not two processes doing the SELECT at the same time can grab the same record. I am pretty sure that the UPDATE with OUTPUT clause, especially combined with the READPAST and UPDLOCK hints (see below for more details), will be fine. However, I have not tested this exact scenario. If for some reason the above query does not take care of the race condition, then adding the following will: application locks.

The CTE query above can be wrapped in sp_getapplock and sp_releaseapplock to create a "gate keeper" for the process. In doing so, only one process at a time will be able to enter in order to run the query above. The other process(es) will be blocked until the process with the applock releases it. And since this step of the overall process is just to grab the RecordID, it is fairly quick and won't be blocking the other process(es) for very long. And, just as with the CTE query, we are not blocking the entire table, thereby allowing other updates to other rows (to set their status to either "Completed" or "Error"). Essentially:

BEGIN TRANSACTION;
EXEC sp_getapplock @Resource = 'GetNextRecordToProcess', @LockMode = 'Exclusive';

   {CTE UPDATE query shown above}

EXEC sp_releaseapplock @Resource = 'GetNextRecordToProcess';
COMMIT TRANSACTION;

Application locks are very nice but should be used sparingly.

Lastly, you just need a stored procedure to handle setting the status to either "Completed" or "Error". And that can be a simple:

CREATE PROCEDURE ProcessTable_SetProcessStatusID
(
   @RecordID INT,
   @ProcessStatusID TINYINT
)
AS
SET NOCOUNT ON;

UPDATE pt
SET    pt.ProcessStatusID = @ProcessStatusID,
       pt.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
FROM   ProcessTable pt
WHERE  pt.RecordID = @RecordID;

Table Hints (found at Hints (Transact-SQL) - Table):

  • READPAST (seems to fit this exact scenario)

    Specifies that the Database Engine not read rows that are locked by other transactions. When READPAST is specified, row-level locks are skipped. That is, the Database Engine skips past the rows instead of blocking the current transaction until the locks are released...READPAST is primarily used to reduce locking contention when implementing a work queue that uses a SQL Server table. A queue reader that uses READPAST skips past queue entries locked by other transactions to the next available queue entry, without having to wait until the other transactions release their locks.

  • ROWLOCK (just to be safe)

    Specifies that row locks are taken when page or table locks are ordinarily taken.

  • UPDLOCK

    Specifies that update locks are to be taken and held until the transaction completes. UPDLOCK takes update locks for read operations only at the row-level or page-level.