Sql-server – Safe to use WITH (NOLOCK) in this case

concurrencydeadlocknolocksql server

Scenario

I have a table that deals with concurrent SELECT's and DELETE's. I'm getting a few deadlocks on my SELECT statements. I assume that the DELETE from the other transaction is getting an exclusive lock and conflicting with the shared lock of the SELECT.

Details

  • SQL Server 14.00.3281.6.v1
  • Running on AWS RDS

Use Case

  • My app can trigger a SELECT in a variety of ways.
  • If the app triggers a DELETE, it will (always) then trigger a SELECT to retrieve the results that reflect the effects of the DELETE.

In the case of a concurrent SELECT and DELETE, it might look like this (drawn in MS Paint, because I try to be professional…):

timelinePic

Edit: By "trigger" above I don't mean an actual db trigger, but just application behavior.

Research

I poked around here on the DBA Stack Exchange and found that I could SELECT myTable WITH (NOLOCK) which would prevent the shared lock. I'm considering using this, but I know there are lots of caveats and gotchas so I want to validate my decision or replace it if necessary.

I'm new to WITH (NOLOCK) so here's what I've learned from this helpful site:

the WITH (NOLOCK) table hint retrieves the rows without waiting for the other queries, that are reading or modifying the same data, to finish its processing.

Justifications

These quotes are from the same link. Under each one I've described my thinking in concluding that the behavior won't affect me.

In general, using explicit table hints frequently is considered as a bad practice that you should generally avoid. For the NOLOCK table hint specifically, reading uncommitted data that could be rolled back after you have read it can lead to a Dirty read, which can occur when reading the data that is being modified or deleted during the uncommitted data read, so that the data you read could be different, or never even have existed.

Dirty Read: I don't think I need to care about this because the SELECT is not actually invalid because of the dirty read. Any DELETE that caused a dirty read will then trigger a new SELECT that will correct the final result. I consider both SELECT results valid, though one was only valid for a few milliseconds.

The WITH (NOLOCK) table hint also leads to Nonrepeatable reads; this read occurs when it is required to read the same data multiple times and the data changes during these readings. In this case, you will read multiple versions of the same row.

Nonrepeatable Read: My SELECT only has one SELECT statement from that table, so I assume this isn't an issue.

Phantom reads can be also a result of using the WITH(NOLOCK) table hint, in which you will get more records when the transaction that is inserting new records is rolled back, or fewer records when the transaction that is deleting existing data is rolled back.

Phantom Read: Same as Dirty Read

Another problem that may occur when other transactions move data you have not read yet to a location that you have already scanned, or have added new pages to the location that you already scanned. In this case, you will miss these records and will not see it in the returned result. If another transaction moves the data that you have already scanned to a new location that you have not read yet, you will read the data twice.

I'm not sure if this affects my use case or not, but I assume just deleting a few rows isn't going to cause this level of page reshuffling. I really don't know what I'm talking about on this topic though, so maybe I'm way off.

Question

Is this use case one of those rare ones that actually can make safe use of WITH (NOLOCK) or is there still some danger I should consider?

Initial Results

While I was writing this question I just threw it into Dev to try it, and it seems that there are still deadlocks happening somewhere. I'd still like to understand if this approach is valid and if it may still be part of the overall deadlock solution.

Other Ideas

I found this suggestion to use the SNAPSHOT isolation level, but I am unsure if I should have to bear the performance penalty when my use case can withstand side effects like dirty reads.

Disclaimer:
I'm not a DBA, but a software dev with a few years of DB experience. I'm only "textbook-level" familiar with the inner workings of locking, pages, hints, etc. so please bear with me and let me know if I can improve the question in any way. I'm happy clarify if needed.

Best Answer

I'm not a DBA, but a software dev with a few years of DB experience. I'm only "textbook-level" familiar with the inner workings of locking, pages, hints,

Then you should use SNAPSHOT isolation, or set your database to READ COMMITTED SNAPSHOT, because it's fundamentally simpler to write correct, scalable, deadlock-free code.

It's a common misconception that NOLOCK/ReadUncommited relaxes the concurrency model in a predictable way. IE that it simply allows you to read data in a state that might eventually be rolled back. This is not the case. Rows sometimes need to move around when they are changed, and a NOLOCK query might miss such rows or read them multiple times. Or a NOLOCK query might read a non-clustered index and the underlying table when one was updated but the other was not. Both of these can cause results that are just wrong.

The cost of READ COMMITTED SNAPSHOT/SNAPSHOT is that rows need a little extra bookkeeping. There's an additional 14 byte field added to updated and deleted rows to point to the previous row version, and the row versions are stored either in TempDb or in the user database. But there's a performance benefit too. Your workload can scale more easily, as there is less locking contention, and sessions are able to run more concurrently.