SQL Server Concurrency – Will My SQL Query Use Stale Data and How to Prevent It?

concurrencyisolation-levelsql server

I have two tables (SJob & SJobDependent) that I need to join for some logic in a stored procedure. They both have a column (job) that connects them in a one-to-many relationship – one SJob record for zero or more SJobDependent records.

Here is my SQL query:

-- Return any records that are active and have no unsatisfied dependencies.
SELECT * FROM SJob
LEFT JOIN SJobDependent
    ON SJob.job = SJobDependent.job
    AND SJobDependent.satisfied = 0
WHERE SJobDependent.jobDependentID IS NULL
AND SJob.state = 'active'

Here is the Actual Execution Plan from SQL Server Studio:

SQL Server Studio Actual Execution Plan

Due to the way the code is written:

// Pseudo-code:

// SJob record is added with SJob.state = 'ready'.

// Related SJobDependent record(s) are added.

// SJob record is updated to SJob.state = 'active'.

I fear that this may happen when the SQL query runs:

  1. Scan SJobDependent.
  2. SJobDependent record(s) inserted.
  3. Start scan of SJob. SJob.state is 'ready'.
  4. SJob is updated. This blocks reading of SJob?
  5. End scan of SJob. SJob.state is 'active'.

The problem I fear is that my SQL query returns SJob records found in the "active" state (SJob.state = 'active'), but fails to see the related SJobDependent records.

Is this problem capable of happening, or am I over-analyzing the SQL query?

If this is a legitimate problem to worry about, what can I do to solve it? I'm open to solutions.

One idea I've had is to force the scan of SJobDependent to occur after the scan of SJob. Is this even possible? What are the implications/consequences of doing this?

Do the scans shown in the Actual Execution Plan occur in a particular order or is it always random from call-to-call?

NOTE: As noted in AMtwo's answer, Repeatable Read isolation level will probably not solve my problem, due to the fact that it only takes effect when the read starts.

Best Answer

If you're using the default isolation level in SQL Server (Read Committed), then you certainly can run into all sorts of issues around inconsistent reads. Paul White describes the problems here.

If you want your read queries to read data which is fully consistent to how it looked at a given point in time, I'd recommend that you consider Read Committed Snapshot Isolation (RCSI). With RCSI, your query will return data that is consistent to a single point in time (the start of your query). If User A starts a SELECT query while User B is concurrently performing updates, User A will read the "old" value because it will read a snapshot of the data, which is consistent to the start of the query.

The catch with RCSI is that it's a database-level setting. Unlike Read Uncommitted, you can't set it as a session-scoped setting. You'll have to consider this change more globally before making the change. However generally speaking, if you require consistent reads for this query, you probably want consistent reads for the entire application.

While the Repeatable Read isolation level may look appealing to solve your problem, but note this detail from the linked post:

The repeatable read isolation level provides a guarantee that data will not change for the life of the transaction once it has been read for the first time.

This means that the data can still be changed prior to being accessed, but during the time your query is running. It is also subject to some of the same inconsistent reads as the Read Committed isolation level--notably phantoms.