if anyone could please help me understand why the following transactions deadlock?
I am providing the 2 transactions and the indexes for each table:
Transaction 1:
SELECT blockId, blockData, syncedToGeneration, itemId
FROM blocks
WHERE indexId=@indexId
and itemId IN (SELECT itemId
FROM KeywordReferences
WHERE indexId=@indexId
AND keywordRootId IN (360,4498,359,1229))
Transaction 2:
UPDATE blocks
SET blockData=@blockData,
syncedToGeneration=@syncedToGeneration
WHERE indexId=@indexId
AND blockId=@blockId
(please notice that the 'IN' section in the first transaction is much longer and contains about 30 values, which i truncated for sake of readability)
The blocks table has the following indexes:
– indexId->blockId (Clustered)
– indexId->itemId
indexId->itemId
The keywordReferences table has the following indexes:
– indexId_>keywordRootId (Clustered)
– indexId->keywordRootId->score
– indexId->itemId
– indexId->blockId
Bellow is the deadlock graph xml output:
<deadlock-list>
<deadlock victim="process5a274c8">
<process-list>
<process id="process5a274c8" taskpriority="0" logused="0" waitresource="PAGE: 5:1:91671" waittime="2709" ownerId="122348" transactionname="SELECT" lasttranstarted="2012-04-19T21:23:26.680" XDES="0xf382aca0" lockMode="S" schedulerid="4" kpid="10992" status="suspended" spid="69" sbid="0" ecid="0" priority="0" trancount="0" lastbatchstarted="2012-04-19T21:23:26.650" lastbatchcompleted="2012-04-19T21:23:26.647" clientapp=".Net SqlClient Data Provider" hostname="AMIT-PC" hostpid="6752" loginname="sa" isolationlevel="read committed (2)" xactid="122348" currentdb="5" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
<executionStack>
<frame procname="adhoc" line="1" stmtstart="34" sqlhandle="0x020000002e9a3633b816ffc89dc234b4c0351887e4e1b2cf">
SELECT blockId, blockData, syncedToGeneration, itemId FROM blocks WHERE indexId=@indexId and itemId IN (SELECT itemId FROM KeywordReferences WHERE indexId=@indexId AND keywordRootId IN (360,4498,359,1229,2143,14330,7661,3755,1156,21490,5567,1933,429,28197,2,3165,524,3182,2655,27262,17407,2673,570,1478,3802,6838,19668,17,6586,2484,2794,1640,5171,2558,6592,5833,695,1199,2307,335,1351,6651,6899,3740,7048,22030,14356,597,3175,3965,3297,2711,14484,2761,2265,28,1647,3223,226,304,298,1157,197,2696,21172,19149,9,1159,135,1,3166,23325)) </frame>
<frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
(@indexId bigint)SELECT blockId, blockData, syncedToGeneration, itemId FROM blocks WHERE indexId=@indexId and itemId IN (SELECT itemId FROM KeywordReferences WHERE indexId=@indexId AND keywordRootId IN (360,4498,359,1229,2143,14330,7661,3755,1156,21490,5567,1933,429,28197,2,3165,524,3182,2655,27262,17407,2673,570,1478,3802,6838,19668,17,6586,2484,2794,1640,5171,2558,6592,5833,695,1199,2307,335,1351,6651,6899,3740,7048,22030,14356,597,3175,3965,3297,2711,14484,2761,2265,28,1647,3223,226,304,298,1157,197,2696,21172,19149,9,1159,135,1,3166,23325)) </inputbuf>
</process>
<process id="process5a13b88" taskpriority="0" logused="215304" waitresource="PAGE: 5:1:91669" waittime="2910" ownerId="128212" transactionname="user_transaction" lasttranstarted="2012-04-19T21:23:28.567" XDES="0xedcd9000" lockMode="IX" schedulerid="2" kpid="5500" status="suspended" spid="68" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2012-04-19T21:23:29.007" lastbatchcompleted="2012-04-19T21:23:29.007" clientapp=".Net SqlClient Data Provider" hostname="AMIT-PC" hostpid="6752" loginname="sa" isolationlevel="read committed (2)" xactid="128212" currentdb="5" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
<executionStack>
<frame procname="adhoc" line="1" stmtstart="154" sqlhandle="0x02000000f4d83b0df2bfedc5a346288c21fa78e07eb152f6">
UPDATE blocks SET blockData=@blockData, syncedToGeneration=@syncedToGeneration WHERE indexId=@indexId AND blockId=@blockId </frame>
<frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
(@indexId bigint,@blockId int,@blockData ntext,@syncedToGeneration int)UPDATE blocks SET blockData=@blockData, syncedToGeneration=@syncedToGeneration WHERE indexId=@indexId AND blockId=@blockId </inputbuf>
</process>
</process-list>
<resource-list>
<pagelock fileid="1" pageid="91671" dbid="5" objectname="isqdb.dbo.blocks" id="lock5c54700" mode="IX" associatedObjectId="72057594043826176">
<owner-list>
<owner id="process5a13b88" mode="IX"/>
</owner-list>
<waiter-list>
<waiter id="process5a274c8" mode="S" requestType="wait"/>
</waiter-list>
</pagelock>
<pagelock fileid="1" pageid="91669" dbid="5" objectname="isqdb.dbo.blocks" id="lock5b84780" mode="S" associatedObjectId="72057594043826176">
<owner-list>
<owner id="process5a274c8" mode="S"/>
</owner-list>
<waiter-list>
<waiter id="process5a13b88" mode="IX" requestType="wait"/>
</waiter-list>
</pagelock>
</resource-list>
</deadlock>
</deadlock-list>
Best Answer
This looks like a typical case of table scan (possible missing indexes, but could be more). The SELECT has chosen a page lock granularity, indicative of a large scan (on
blocks
table). Also note how all the locks are on the same resource rowset, the clustered index, another indication that the SELECT does not use a selective secondary index to locate the rows. You seem to had hoped that the nonclustered index on(indexId, itemId)
, which you describe as existing, would had beed chosen and perhaps, had it be chosen, the query would not deadlock. As things stand it looks like is ignored most likely because of selectivity (it hits the tipping point). Is difficult to advise for a better index w/o knowing your workload and data model. At the very least I can say that the requirement to projectblockData, syncedToGeneration
in the SELECT list makes the index non-covering, so perhaps adding them as included columns would be a first step. But w/o knowing the size of these columns (blockData
seems like the name of a large one...) is, again, difficult to predict the outcome.To exemplify why the scan is the culprit, consider any two rows updated by the UPDATE statement. The UPDATE will update them in the order
(indexId, blockId)
but the SELECT scan will see them in the order(indexId, itemId)
. Unless there is a functional dependency betweenitemId
andblockId
that guarantees that the two indexe keys return all the rows int he same order, there will always be some pair of rows for which the order is reversed between the clustered index and the non-clustered index. The gist of it is that because the SELECT does a scan, it is guaranteed to visit every row. Therefore any pair of rows updated by the UPDATE will be visited by the scan. There are four possibilities:This situation will always arrise with multi-row updates in the presence of scans. The solution is usually to replace the scan with a more selective operation, which has the added benefit of improving performance. If the SELECT and the UPDATE only visit the rows they really desire to process then a deadlock can occur only if two operations occur concurrently on the same logical data (ie. the same item is being read and updated at the same time) but most times on OLTP systems this is prevented by the business workflow. It can still happen, but with a much reduced frequency. As a side note, even when operating on disjoint logical items it is still a game of probabilities due to hash collisions.
One easy way out is, indeed, deploying snapshot isolation. But the underlying problem (the scan) will be only masked, not alleviated. Other problems arising from the scan (poor performance, high latency and slow response time, buffer pool pollution etc) are stil going to be present. You should fix the scan problem and deploy snapshot isolation.
P.S. note that the fact that the scan uses PAGE granularity locks has no relevance. Had the scan be using ROW granularity the deadlock would simply occur on rows instead of pages. But the fact that PAGE granularity is chose is significant as a telltale for a scan.