How to Avoid Race Conditions with Cross DB Proc Calls in SQL Server

sql serversql-server-2012stored-procedurest-sql

I am not even sure this question is necessary but I am curious to know everyone's thoughts. I have two databases on the same server, dbFoo, dbBar. dbFoo has the following table please note this is a way dumbed down example, and the syntax may not be correct as I am rushing and way more interested in the answer to the underlying issue then the code to do it…

CREATE TABLE dbo.CodeNumbers(
CodeNumbersID INT IDENTITY (1,1) NOT NULL PRIMARY KEY,
CodeValue VARCHAR(30) NOT NULL
IsUsed BIT NOT NULL DEFAULT(0)
);

dbo.CodeNumbers gets populated with a monthly CSV provided, the import method of your choice is already written to get them in there. We are NEVER given a duplicate code.

Lets assume for arguments sake we have 10,000,000 rows in the table. that all follow this format when imported:

1, 'ajdirjfisofklrlfo039402', 0 all the way till
10000000, 'fkeiir9489', 0

Now in dbBar I have 2 stored procedures the first should access the first un-used code in dbFoo, return it in an out variable and mark it as used. So I have something like:

CREATE PROCEDURE GetNextUseableCode
   @CodeOut VARCHAR(30) OUTPUT,
   @CID INT OUTPUT
AS
   SELECT @CID = CodeNumbersID, @CodeOut = CodeValue
   FROM dbFoo.dbo.CodeNumbers
   WHERE IsUsed = 0

   UPDATE dbFoo.dbo.CodeNumbers
   SET IsUsed = 1
   WHERE CodeNumbersID = @CID 

The code that calls the procedure from dbBar is accessed by 200k sessions a day at various times. When dbFoo.Codes has no more to return, its fine all is well and good, the application is simply told sorry no more check back tomorrow.

I have 3 main questions..

  1. Is there anything special I would need to have in the code to avoid race conditions, and if so what would be best to handle for this without bringing the system to its knees.

  2. Is their an efficient way to ensure the next code grabbed whenever the procedure is called, is the next one in chronological order within the ID column.

  3. Is there any other concerns that I am not looking at right now that could issue big problems, and what would be an eloquent way to handle this situation?

I understand this is a long pretty open ended question, I have some coded up solutions, but I feel like there are much better ways to get the results I want.

Thanks in advance as always for all help.

Best Answer

There is nothing in your SELECT that dictates order at all. It is also not protected from two sessions reading the same row. To see that it is not safe:

  1. Create a throwaway table with a key column.

    CREATE TABLE dbo.GeneratedIDs(ID INT PRIMARY KEY);
    
  2. Run this code in a loop from two different sessions:

    SET NOCOUNT ON;
    
    DECLARE @DECLARE @CID INT, @CodeOUt VARCHAR(30);
    
    SELECT @CID = CodeNumbersID, @CodeOut = CodeValue
      FROM dbFoo.dbo.CodeNumbers
      WHERE IsUsed = 0
    
    UPDATE dbFoo.dbo.CodeNumbers
      SET IsUsed = 1
      WHERE CodeNumbersID = @CID 
    
    INSERT dbo.GeneratedIDs SELECT @CID;
    
    GO 100000
    
  3. Check the output for these - they'll happen:

    Msg 2627, Level 14, State 1
    Violation of PRIMARY KEY constraint 'PK_hexcode'. Cannot insert duplicate key in object 'dbo.GeneratedIDs'. The duplicate key value is (<some value>).

    Or you might see deadlocks if you wrap the SELECT/UPDATE in an explicit transaction:

    Msg 1205, Level 13, State 52
    Transaction (Process ID 57) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

To get around this problem, and to ensure that the ID you get is the lowest one available, you could do this:

BEGIN TRANSACTION;

SELECT TOP (1) @CID = ...
FROM dbFoo.dbo.CodeNumbers WITH (XLOCK, HOLDLOCK)
WHERE IsUsed = 0
ORDER BY CodeNumbersID;

UPDATE ...

COMMIT TRANSACTION;

Note that I added an explicit transaction and also XLOCK/HOLDLOCK hints to prevent two simultaneous sessions from reading the same row. Of course, this has an impact on concurrency (which, unfortunately, is exactly what you want and need here).

Other ways to do this include just updating the row and then using a table variable to capture values from the OUTPUT clause:

DECLARE @x TABLE(CodeOut VARCHAR(30), CID INT);

;WITH x AS 
(
  SELECT TOP (1) CodeNumbersID, CodeValue, IsUsed
    FROM dbFoo.dbo.CodeNumbers WITH (XLOCK, HOLDLOCK)
    WHERE IsUsed = 0
    ORDER BY CodeNumbersID
)
UPDATE x SET IsUsed = 1
  OUTPUT inserted.CodeValue, inserted.CodeNumbersID INTO @x;

SELECT @CodeOut = CodeOut, @CID = CID FROM @x;

Per Paul's update, yes, you could also do this without the table variable:

;WITH x AS 
(
  SELECT TOP (1) CodeNumbersID, CodeValue, IsUsed
    FROM dbFoo.dbo.CodeNumbers WITH (XLOCK, HOLDLOCK)
    WHERE IsUsed = 0
    ORDER BY CodeNumbersID
)
UPDATE x 
  SET @CodeOut = x.CodeValue, @CID = x.CodeNumbersID, IsUsed = 1;

(Though I am not all that fond of this syntax; not sure why. Might be the same reason I always forget it exists.)

You could change the caller to expect a resultset instead of two output parameters, but that is extra work too. In both cases you'd still need to ensure you got the lowest ID available, which likely means a CTE with a SELECT and the same hints. Some discussion here. I also talk about a similar approach in this blog post but I didn't get into anything about concurrency and two sessions trying to delete the same row at the same time. Obviously in that case only one of them can win, but with an UPDATE they could both manage to be successful (at least in theory).

To make things easier, you could relax the restriction that the "next" ID handed out is the lowest ID available. But you still need the isolation through the hints to ensure that two simultaneous sessions don't happen to read the same value (which could happen regardless of ordering). Hopefully with a suitable index these won't destroy concurrency.