I have below two queries on Production:
Query # 1: Script that deletes older historical data from core table "Table1" based on "Date" column
There are ~20 tables that point to "Table1" by foreign key with cascade rule, so when you delete from "Table1", data is deleted from ~20 other tables, too
Script deletes data in batches 50 rows at a time with 1 second interval between batches
Script does NOT cause Exclusive locks to escalate to table-level lock, when batch is only 50 rows; it only places row/page level (X) locks
Query # 2: Production Application that pulls data from "Table1", "Table2", "Table3" and some other tables
This is heavy read query, that places Shared locks on tables. It does not do any insert/update/delete
Problem:
Sometimes, deadlock occurs when Query # 2
(Prod app) reads data, and I simultaneously run Query # 1
(script deleting older data) in SSMS. Query # 2
is always a victim, as I understand it is because it is NOT doing any changes to transaction log, while Query # 1
does
Trying to resolve this, I've added set deadlock_priority -10
to Query # 1
, hoping that it will make Query # 1
victim in any possible deadlocks
First it seemed it worked, Query # 1
became victim in deadlock situations, throwing error 1205 when deleting some of the batches.
BUT – recently I discovered that sometimes Query # 2
still becomes a victim
Question:
How is this possible that Query # 2
sometimes becomes a victim even after I've set deadlock_priority -10
for Query # 1
?
Is there a way to fix it and make sure Query # 1
will be the victim in 100 % cases ?
Update: added graph below
Best Answer
You could attempt to solve the issue by changing the transaction isolation level. Specifically enabling Snapshot isolation and setting the Read Committed Snapshot so that reads and writes won't interfere with each other. You could also just update the production query to use snapshot isolation without updating the default database isolation level.
Be careful though and thoroughly test this because it will increase the traffic on tempdb as row snapshots are written. Setting Read Committed Snapshot is also a default behavior change across the whole database which needs regression testing to verify there are no unexpected side effects.
A better explanation on this option is available from Microsoft here: https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/snapshot-isolation-in-sql-server