SQL Server Concurrency – Will My SQL Query Use Stale Data and How to Prevent It?

concurrencyisolation-levelsql server

I have two tables (SJob & SJobDependent) that I need to join for some logic in a stored procedure. They both have a column (job) that connects them in a one-to-many relationship – one SJob record for zero or more SJobDependent records.

Here is my SQL query:

-- Return any records that are active and have no unsatisfied dependencies.
SELECT * FROM SJob
LEFT JOIN SJobDependent
    ON SJob.job = SJobDependent.job
    AND SJobDependent.satisfied = 0
WHERE SJobDependent.jobDependentID IS NULL
AND SJob.state = 'active'

Here is the Actual Execution Plan from SQL Server Studio:

Due to the way the code is written:

// Pseudo-code:

// SJob record is added with SJob.state = 'ready'.

// Related SJobDependent record(s) are added.

// SJob record is updated to SJob.state = 'active'.

I fear that this may happen when the SQL query runs:

Scan SJobDependent.
SJobDependent record(s) inserted.
Start scan of SJob. SJob.state is 'ready'.
SJob is updated. This blocks reading of SJob?
End scan of SJob. SJob.state is 'active'.

The problem I fear is that my SQL query returns SJob records found in the "active" state (SJob.state = 'active'), but fails to see the related SJobDependent records.

Is this problem capable of happening, or am I over-analyzing the SQL query?

If this is a legitimate problem to worry about, what can I do to solve it? I'm open to solutions.

One idea I've had is to force the scan of SJobDependent to occur after the scan of SJob. Is this even possible? What are the implications/consequences of doing this?

Do the scans shown in the Actual Execution Plan occur in a particular order or is it always random from call-to-call?

NOTE: As noted in AMtwo's answer, Repeatable Read isolation level will probably not solve my problem, due to the fact that it only takes effect when the read starts.

Best Answer

If you're using the default isolation level in SQL Server (Read Committed), then you certainly can run into all sorts of issues around inconsistent reads. Paul White describes the problems here.

If you want your read queries to read data which is fully consistent to how it looked at a given point in time, I'd recommend that you consider Read Committed Snapshot Isolation (RCSI). With RCSI, your query will return data that is consistent to a single point in time (the start of your query). If User A starts a SELECT query while User B is concurrently performing updates, User A will read the "old" value because it will read a snapshot of the data, which is consistent to the start of the query.

The catch with RCSI is that it's a database-level setting. Unlike Read Uncommitted, you can't set it as a session-scoped setting. You'll have to consider this change more globally before making the change. However generally speaking, if you require consistent reads for this query, you probably want consistent reads for the entire application.

While the Repeatable Read isolation level may look appealing to solve your problem, but note this detail from the linked post:

The repeatable read isolation level provides a guarantee that data will not change for the life of the transaction once it has been read for the first time.

This means that the data can still be changed prior to being accessed, but during the time your query is running. It is also subject to some of the same inconsistent reads as the Read Committed isolation level--notably phantoms.

Related Solutions

Sql-server – Slow delete caused by many foreign keys

Make sure that there are indexes on the constrained columns, since the dbms will do a lookup using those columns in the referencing tables.

You can try using this script, it generates an index creation script for any non-indexed column being referenced by a cascading constraints.

SELECT 'CREATE NONCLUSTERED INDEX IX_'+OBJECT_NAME(fk.parent_object_id)+'_'+c.name+' ON '+OBJECT_NAME(fk.parent_object_id)+'('+c.name+') WITH (ONLINE=ON)' 
        --, OBJECT_NAME(fk.referenced_object_id) AS referenced_tale,  cc.name
FROM 
    sys.foreign_keys fk
    INNER JOIN sys.foreign_key_columns fkc ON fk.object_id = fkc.constraint_object_id
    INNER JOIN sys.columns c ON fkc.parent_object_id = c.object_id AND fkc.parent_column_id = c.column_id
    INNER JOIN sys.columns cc ON fkc.referenced_object_id = cc.object_id AND fkc.referenced_column_id = cc.column_id
WHERE delete_referential_action_desc IN ('CASCADE', 'SET_NULL')
    AND NOT EXISTS
        (
        SELECT  1
        FROM
            SYS.index_columns ic 
            INNER JOIN sys.indexes i ON i.object_id = ic.object_id AND ic.index_id = i.index_id
        WHERE 
            1 = 1
            AND type_DESC IN ('CLUSTERED','NONCLUSTERED')
            AND ic.OBJECT_ID = c.object_id
            AND ic.column_id = c.column_id
            AND ic.is_included_column = 0
        )

Sql-server – Concurrent reads and updates

My favourite way of achieving this is the OUTPUT clause.
Here is an example:

SET NOCOUNT ON;

-- example setup for the queue table
DECLARE @queue TABLE (
    transaction_id int PRIMARY KEY,
    processed bit
);


-- some sample data
INSERT INTO @queue 
VALUES
    (1,0),
    (2,0),
    (3,0),
    (4,0),
    (5,0);

Now the processing thread:

-- processing thread
WHILE 1 = 1
BEGIN 
    DECLARE @transaction_id int;

    DECLARE @row_to_process TABLE (
        transaction_id int
    );

    BEGIN TRAN;

    BEGIN TRY

        DELETE FROM @row_to_process;

        UPDATE Q
        SET processed = 1
        OUTPUT inserted.transaction_id 
            INTO @row_to_process
        FROM (
            SELECT TOP(1) transaction_id, processed
            FROM @queue
            WHERE processed = 0
            ORDER BY transaction_id
        ) AS Q;

        IF @@ROWCOUNT = 0
        BEGIN
            COMMIT;
            -- sleep for 5 seconds then restart the loop
            WAITFOR DELAY '00:00:05'; 
            CONTINUE;
        END

        SELECT @transaction_id = transaction_id
        FROM @row_to_process;

        IF @transaction_id IS NOT NULL
        BEGIN
            RAISERROR(N'processing transaction N. %d.',1,1, @transaction_id) WITH NOWAIT;
            EXEC whatever_processes_the_row @transaction_id;
        END

        COMMIT;

    END TRY
    BEGIN CATCH
        ROLLBACK;
        THROW;
    END CATCH
END

If you're marking the row as processed by deleting it, it works in the same way: the only thing you have to do is change the UPDATE statement into a DELETE statement and pull the current @transaction_id from the DELETED logical table.

Best Answer

Related Solutions

Sql-server – Slow delete caused by many foreign keys

Sql-server – Concurrent reads and updates

Related Question