Sql-server – Why am i getting deadlocks with 1 read and 1 write thread in SQL

deadlocksql serversql-server-2008transaction

if anyone could please help me understand why the following transactions deadlock?
I am providing the 2 transactions and the indexes for each table:

Transaction 1:

SELECT blockId, blockData, syncedToGeneration, itemId 
FROM blocks 
WHERE indexId=@indexId 
and itemId IN (SELECT itemId 
               FROM KeywordReferences 
               WHERE indexId=@indexId 
               AND keywordRootId IN (360,4498,359,1229))

Transaction 2:

UPDATE blocks 
   SET blockData=@blockData,
       syncedToGeneration=@syncedToGeneration 
   WHERE indexId=@indexId 
AND blockId=@blockId

(please notice that the 'IN' section in the first transaction is much longer and contains about 30 values, which i truncated for sake of readability)

The blocks table has the following indexes:
– indexId->blockId (Clustered)
– indexId->itemId

indexId->itemId

The keywordReferences table has the following indexes:
– indexId_>keywordRootId (Clustered)
– indexId->keywordRootId->score
– indexId->itemId
– indexId->blockId

Bellow is the deadlock graph xml output:

<deadlock-list>
 <deadlock victim="process5a274c8">
  <process-list>
   <process id="process5a274c8" taskpriority="0" logused="0" waitresource="PAGE: 5:1:91671" waittime="2709" ownerId="122348" transactionname="SELECT" lasttranstarted="2012-04-19T21:23:26.680" XDES="0xf382aca0" lockMode="S" schedulerid="4" kpid="10992" status="suspended" spid="69" sbid="0" ecid="0" priority="0" trancount="0" lastbatchstarted="2012-04-19T21:23:26.650" lastbatchcompleted="2012-04-19T21:23:26.647" clientapp=".Net SqlClient Data Provider" hostname="AMIT-PC" hostpid="6752" loginname="sa" isolationlevel="read committed (2)" xactid="122348" currentdb="5" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="adhoc" line="1" stmtstart="34" sqlhandle="0x020000002e9a3633b816ffc89dc234b4c0351887e4e1b2cf">
SELECT blockId, blockData, syncedToGeneration, itemId FROM blocks WHERE indexId=@indexId and itemId IN (SELECT itemId FROM KeywordReferences WHERE indexId=@indexId AND keywordRootId IN (360,4498,359,1229,2143,14330,7661,3755,1156,21490,5567,1933,429,28197,2,3165,524,3182,2655,27262,17407,2673,570,1478,3802,6838,19668,17,6586,2484,2794,1640,5171,2558,6592,5833,695,1199,2307,335,1351,6651,6899,3740,7048,22030,14356,597,3175,3965,3297,2711,14484,2761,2265,28,1647,3223,226,304,298,1157,197,2696,21172,19149,9,1159,135,1,3166,23325))     </frame>
     <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown     </frame>
    </executionStack>
    <inputbuf>
(@indexId bigint)SELECT blockId, blockData, syncedToGeneration, itemId FROM blocks WHERE indexId=@indexId and itemId IN (SELECT itemId FROM KeywordReferences WHERE indexId=@indexId AND keywordRootId IN (360,4498,359,1229,2143,14330,7661,3755,1156,21490,5567,1933,429,28197,2,3165,524,3182,2655,27262,17407,2673,570,1478,3802,6838,19668,17,6586,2484,2794,1640,5171,2558,6592,5833,695,1199,2307,335,1351,6651,6899,3740,7048,22030,14356,597,3175,3965,3297,2711,14484,2761,2265,28,1647,3223,226,304,298,1157,197,2696,21172,19149,9,1159,135,1,3166,23325))    </inputbuf>
   </process>
   <process id="process5a13b88" taskpriority="0" logused="215304" waitresource="PAGE: 5:1:91669" waittime="2910" ownerId="128212" transactionname="user_transaction" lasttranstarted="2012-04-19T21:23:28.567" XDES="0xedcd9000" lockMode="IX" schedulerid="2" kpid="5500" status="suspended" spid="68" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2012-04-19T21:23:29.007" lastbatchcompleted="2012-04-19T21:23:29.007" clientapp=".Net SqlClient Data Provider" hostname="AMIT-PC" hostpid="6752" loginname="sa" isolationlevel="read committed (2)" xactid="128212" currentdb="5" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="adhoc" line="1" stmtstart="154" sqlhandle="0x02000000f4d83b0df2bfedc5a346288c21fa78e07eb152f6">
UPDATE blocks SET blockData=@blockData, syncedToGeneration=@syncedToGeneration WHERE indexId=@indexId AND blockId=@blockId     </frame>
     <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown     </frame>
    </executionStack>
    <inputbuf>
(@indexId bigint,@blockId int,@blockData ntext,@syncedToGeneration int)UPDATE blocks SET blockData=@blockData, syncedToGeneration=@syncedToGeneration WHERE indexId=@indexId AND blockId=@blockId    </inputbuf>
   </process>
  </process-list>
  <resource-list>
   <pagelock fileid="1" pageid="91671" dbid="5" objectname="isqdb.dbo.blocks" id="lock5c54700" mode="IX" associatedObjectId="72057594043826176">
    <owner-list>
     <owner id="process5a13b88" mode="IX"/>
    </owner-list>
    <waiter-list>
     <waiter id="process5a274c8" mode="S" requestType="wait"/>
    </waiter-list>
   </pagelock>
   <pagelock fileid="1" pageid="91669" dbid="5" objectname="isqdb.dbo.blocks" id="lock5b84780" mode="S" associatedObjectId="72057594043826176">
    <owner-list>
     <owner id="process5a274c8" mode="S"/>
    </owner-list>
    <waiter-list>
     <waiter id="process5a13b88" mode="IX" requestType="wait"/>
    </waiter-list>
   </pagelock>
  </resource-list>
 </deadlock>
</deadlock-list>

Best Answer

This looks like a typical case of table scan (possible missing indexes, but could be more). The SELECT has chosen a page lock granularity, indicative of a large scan (on blocks table). Also note how all the locks are on the same resource rowset, the clustered index, another indication that the SELECT does not use a selective secondary index to locate the rows. You seem to had hoped that the nonclustered index on (indexId, itemId), which you describe as existing, would had beed chosen and perhaps, had it be chosen, the query would not deadlock. As things stand it looks like is ignored most likely because of selectivity (it hits the tipping point). Is difficult to advise for a better index w/o knowing your workload and data model. At the very least I can say that the requirement to project blockData, syncedToGeneration in the SELECT list makes the index non-covering, so perhaps adding them as included columns would be a first step. But w/o knowing the size of these columns (blockData seems like the name of a large one...) is, again, difficult to predict the outcome.

To exemplify why the scan is the culprit, consider any two rows updated by the UPDATE statement. The UPDATE will update them in the order (indexId, blockId) but the SELECT scan will see them in the order (indexId, itemId). Unless there is a functional dependency between itemId and blockId that guarantees that the two indexe keys return all the rows int he same order, there will always be some pair of rows for which the order is reversed between the clustered index and the non-clustered index. The gist of it is that because the SELECT does a scan, it is guaranteed to visit every row. Therefore any pair of rows updated by the UPDATE will be visited by the scan. There are four possibilities:

  • both rows UPDATEd are behind (in the clustered index key order) the current position of the scan (the current row being scanned). This is safe, the UPDATE blocks and waits for the scan.
  • both rows UPDATEd are ahead of the position of the scan. Also safe, the scan blocks and waits for the update.
  • the first row UPDATEd is behind the current position of the scan, but the second is ahead. Safe, the update blocks.
  • the first row UPDATEd is ahead of the current position of the scan, but the second is behind. This is a guaranteed deadlock. Note that the further appart the rows are in the clustered index order, the higher the probability of one being ahead and one being behind any scan position.

This situation will always arrise with multi-row updates in the presence of scans. The solution is usually to replace the scan with a more selective operation, which has the added benefit of improving performance. If the SELECT and the UPDATE only visit the rows they really desire to process then a deadlock can occur only if two operations occur concurrently on the same logical data (ie. the same item is being read and updated at the same time) but most times on OLTP systems this is prevented by the business workflow. It can still happen, but with a much reduced frequency. As a side note, even when operating on disjoint logical items it is still a game of probabilities due to hash collisions.

One easy way out is, indeed, deploying snapshot isolation. But the underlying problem (the scan) will be only masked, not alleviated. Other problems arising from the scan (poor performance, high latency and slow response time, buffer pool pollution etc) are stil going to be present. You should fix the scan problem and deploy snapshot isolation.

P.S. note that the fact that the scan uses PAGE granularity locks has no relevance. Had the scan be using ROW granularity the deadlock would simply occur on rows instead of pages. But the fact that PAGE granularity is chose is significant as a telltale for a scan.