SQL Server – Duplicate Records Returned from Table with No Duplicates

duplicationisolation-levelsql serversql-server-2008-r2

I have a stored procedure that queries a busy queue table that is used to distribute work in our system. The table in question has a primary key on WorkID and no duplicates.

A simplified version of the query is:

INSERT INTO #TempWorkIDs (WorkID)
SELECT
        W.WorkID

    FROM
        dbo.WorkTable W

    WHERE
        (@bool_param = 0 AND
        ((W.InProgress = 0
         AND ISNULL(W.UserID, -1) != @userid_param
         AND (@bool_filtered = 0
              OR W.TypeID IN (SELECT TypeID FROM #Types AS t)))
         OR 
         (@bool_param = 1
          AND W.InProgress = 1
          AND W.UserID != @userid_param)
        OR
        (@Auto_Param = 0
         AND W.UserID = @userid_param)))
         OR
         (@bool_param = 1 AND W.UserID = @userid_param)
    OPTION
        (RECOMPILE)

The #Types table is populated earlier in the procedure.

As I said, WorkTable is busy, and sometimes while this query is running I SUSPECT one of the records is moving from one set of filters in the WHERE to another. Specifically, this happens when someone starts to work on an item, and the W.InProgress changes from 0 to 1. When this happens I get a duplicate key violation when I try to add a primary key to the temp table this query is inserting into.

I have confirmed in the query plan generated when the error occurs that there is no parallelism, the isolation level is READ COMMITTED, and there are no duplicate records in the source table. You can also see there are no JOINs or other way to get cartesian products in here.

This is the anonymized query plan:

enter image description here

The question is, what is causing the duplicates and how can I get it to stop?

I think READ COMMITTED should work here, I need locking. I am almost positive the dupes occur when the InProgress bit on a record changes while I'm querying. I know this because the table stores the time of that change and it's within milliseconds of when I query and get the error.

Best Answer

There are some tricky scenarios which can result in the same row being read twice from an index, even under the READ COMMITTED isolation level.

Your query does not qualify for an allocation order scan, so the storage engine will read the data from the table in the order of the clustered key.

For your table, you have the InProgress as the first column of the clustered key. It's likely that you're getting row or page locks as you scan through the table. If you read a row near the start of the scan, release the lock on it, that row is updated such that InProgress changes from 0 to 1, and then the row is read again in a different page then you could see duplicate WorkID values from your query.

There are lots of workarounds. You could insert into a heap and simply remove duplicate values. You could add a DISTINCT to the query. You could also enable a row-versioning isolation level, to provide a stable view of the committed state of the database, either as at the beginning of the transaction (snapshot isolation), or as at the beginning of the statement (read committed snapshot isolation).

Perhaps it's appropriate to add locking hints or to change the structure of the table. For a rather fun solution (probably not appropriate for production), you could try reading the index backwards. This can be done with a superfluous TOP along with an ORDER BY. Below is a very simple demo to illustrate the point:

CREATE TABLE #WorkTable (
    InProgress TINYINT NOT NULL,
    WorkID INT NOT NULL
    , PRIMARY KEY (InProgress, WorkID)
);

INSERT INTO #WorkTable WITH (TABLOCK)
SELECT (RN - 1) / 5000, RN
FROM
(
    SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) t
OPTION (MAXDOP 1);

The following query has the Ordered:false property but it'll still read the data in clustered key order:

SELECT WorkId
FROM #WorkTable;

However, the following query will read the data in reverse clustered order:

SELECT TOP (9223372036854775807) WorkId
FROM #WorkTable
ORDER BY InProgress DESC, WorkId DESC;

We can see this by looking at the scan properties:

backwards scan

For your table, this means that if a row is updated such that InProgress changes from 0 to 1 it will be far less likely that it'll show up twice. It may not show up at all which could be a different problem.