SQL Server Deadlock – Resolving Deadlocks with Simple Queries

deadlockparallelismsql server

I have an application that must delete and insert registries from a table, the table is storing the historic registries for an access control and is stored with the datetime for the check in the system.

Now this is the issue. I have a threadpool in which each thread works over one day in a time period (normally a month). When the process start, it's giving me some deadlocks. I made a lock trace and in the deadlock graph only says that a process was blocked but no which sentence or transaction was made, and in the deadlock log it say Parallel query worker thread was involved in a deadlock. How can I make a more detailed trace or query in order to know what is making my app and DB blocking?

Thanks

EDIT

here is the XML for the trace I already made in the past days:

<deadlock victim="process45f9fb048">
  <process-list>
   <process id="process45f9fb048" taskpriority="0" logused="0" waitresource="PAGE: 4:1:98644" waittime="5369" ownerId="1282041063" transactionname="DELETE" lasttranstarted="2014-06-27T03:15:01.157" XDES="0xbeb91620" lockMode="U" schedulerid="7" kpid="1984" status="suspended" spid="204" sbid="0" ecid="4" priority="0" trancount="0" lastbatchstarted="2014-06-27T03:15:00.620" lastbatchcompleted="2014-06-27T03:15:00.620" clientapp="SQLAgent - TSQL JobStep (Job 0xF408BE64B781AB40B05180C1B2EB9DC8 : Step 1)" hostname="REPORTESCRYSTAL" hostpid="28848" isolationlevel="read committed (2)" xactid="1282041063" currentdb="4" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
    <executionStack>
     <frame procname="msdb.dbo.sp_syscollector_purge_collection_logs" line="31" stmtstart="2152" stmtend="2720" sqlhandle="0x03000400af551732fc7ab30044a000000100000000000000">
DELETE FROM dbo.sysssislog
        FROM dbo.sysssislog AS s
        INNER JOIN dbo.syscollector_execution_log_internal AS l ON (l.package_execution_id = s.executionid)
        INNER JOIN #purged_log_ids AS i ON i.log_id = l.log_id

    -- Then delete the actual logs     </frame>
     <frame procname="adhoc" line="2" stmtstart="28" sqlhandle="0x0100040034b4980c80c0a260030000000000000000000000">
EXEC [dbo].[sp_syscollector_purge_collection_logs]     </frame>
    </executionStack>
    <inputbuf>
    </inputbuf>
   </process>
   <process id="process961c988" taskpriority="0" logused="0" waitresource="PAGE: 4:1:98644" waittime="5324" ownerId="1282041063" transactionname="DELETE" lasttranstarted="2014-06-27T03:15:01.157" XDES="0xaece7620" lockMode="U" schedulerid="3" kpid="27920" status="suspended" spid="204" sbid="0" ecid="2" priority="0" trancount="0" lastbatchstarted="2014-06-27T03:15:00.620" lastbatchcompleted="2014-06-27T03:15:00.620" clientapp="SQLAgent - TSQL JobStep (Job 0xF408BE64B781AB40B05180C1B2EB9DC8 : Step 1)" hostname="REPORTESCRYSTAL" hostpid="28848" isolationlevel="read committed (2)" xactid="1282041063" currentdb="4" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
    <executionStack>
     <frame procname="msdb.dbo.sp_syscollector_purge_collection_logs" line="31" stmtstart="2152" stmtend="2720" sqlhandle="0x03000400af551732fc7ab30044a000000100000000000000">
DELETE FROM dbo.sysssislog
        FROM dbo.sysssislog AS s
        INNER JOIN dbo.syscollector_execution_log_internal AS l ON (l.package_execution_id = s.executionid)
        INNER JOIN #purged_log_ids AS i ON i.log_id = l.log_id

    -- Then delete the actual logs     </frame>
     <frame procname="adhoc" line="2" stmtstart="28" sqlhandle="0x0100040034b4980c80c0a260030000000000000000000000">
EXEC [dbo].[sp_syscollector_purge_collection_logs]     </frame>
    </executionStack>
    <inputbuf>
    </inputbuf>
   </process>
   <process id="processb543dc8" taskpriority="0" logused="0" waitresource="PAGE: 4:1:108775" waittime="5367" ownerId="1282041011" transactionname="DELETE" lasttranstarted="2014-06-27T03:15:01.153" XDES="0x6771b32d0" lockMode="U" schedulerid="4" kpid="27392" status="suspended" spid="197" sbid="0" ecid="2" priority="0" trancount="0" lastbatchstarted="2014-06-27T03:15:00.617" lastbatchcompleted="2014-06-27T03:15:00.617" clientapp="SQLAgent - TSQL JobStep (Job 0xE0A214220ED4114D8DD04385F151A9B2 : Step 1)" hostname="REPORTESCRYSTAL" hostpid="28848" isolationlevel="read committed (2)" xactid="1282041011" currentdb="4" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
    <executionStack>
     <frame procname="msdb.dbo.sp_syscollector_purge_collection_logs" line="31" stmtstart="2152" stmtend="2720" sqlhandle="0x03000400af551732fc7ab30044a000000100000000000000">
DELETE FROM dbo.sysssislog
        FROM dbo.sysssislog AS s
        INNER JOIN dbo.syscollector_execution_log_internal AS l ON (l.package_execution_id = s.executionid)
        INNER JOIN #purged_log_ids AS i ON i.log_id = l.log_id

    -- Then delete the actual logs     </frame>
     <frame procname="adhoc" line="2" stmtstart="28" sqlhandle="0x0100040034b4980c80c0a260030000000000000000000000">
EXEC [dbo].[sp_syscollector_purge_collection_logs]     </frame>
    </executionStack>
    <inputbuf>
    </inputbuf>
   </process>
   <process id="process9631948" taskpriority="0" logused="0" waitresource="PAGE: 4:1:108775" waittime="5301" ownerId="1282041011" transactionname="DELETE" lasttranstarted="2014-06-27T03:15:01.153" XDES="0x506b8f300" lockMode="U" schedulerid="5" kpid="23904" status="suspended" spid="197" sbid="0" ecid="1" priority="0" trancount="0" lastbatchstarted="2014-06-27T03:15:00.617" lastbatchcompleted="2014-06-27T03:15:00.617" clientapp="SQLAgent - TSQL JobStep (Job 0xE0A214220ED4114D8DD04385F151A9B2 : Step 1)" hostname="REPORTESCRYSTAL" hostpid="28848" isolationlevel="read committed (2)" xactid="1282041011" currentdb="4" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
    <executionStack>
     <frame procname="msdb.dbo.sp_syscollector_purge_collection_logs" line="31" stmtstart="2152" stmtend="2720" sqlhandle="0x03000400af551732fc7ab30044a000000100000000000000">
DELETE FROM dbo.sysssislog
        FROM dbo.sysssislog AS s
        INNER JOIN dbo.syscollector_execution_log_internal AS l ON (l.package_execution_id = s.executionid)
        INNER JOIN #purged_log_ids AS i ON i.log_id = l.log_id

    -- Then delete the actual logs     </frame>
     <frame procname="adhoc" line="2" stmtstart="28" sqlhandle="0x0100040034b4980c80c0a260030000000000000000000000">
EXEC [dbo].[sp_syscollector_purge_collection_logs]     </frame>
    </executionStack>
    <inputbuf>
    </inputbuf>
   </process>
   <process id="process964fdc8" taskpriority="0" logused="10000" waittime="5745" schedulerid="8" kpid="32756" status="suspended" spid="204" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-06-27T03:15:00.620" lastbatchcompleted="2014-06-27T03:15:00.620" clientapp="SQLAgent - TSQL JobStep (Job 0xF408BE64B781AB40B05180C1B2EB9DC8 : Step 1)" hostname="REPORTESCRYSTAL" hostpid="28848" loginname="SALUDTOTAL\bo_agent" isolationlevel="read committed (2)" xactid="1282041063" currentdb="4" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
    <executionStack>
     <frame procname="msdb.dbo.sp_syscollector_purge_collection_logs" line="31" stmtstart="2152" stmtend="2720" sqlhandle="0x03000400af551732fc7ab30044a000000100000000000000">
DELETE FROM dbo.sysssislog
        FROM dbo.sysssislog AS s
        INNER JOIN dbo.syscollector_execution_log_internal AS l ON (l.package_execution_id = s.executionid)
        INNER JOIN #purged_log_ids AS i ON i.log_id = l.log_id

    -- Then delete the actual logs     </frame>
     <frame procname="adhoc" line="2" stmtstart="28" sqlhandle="0x0100040034b4980c80c0a260030000000000000000000000">
EXEC [dbo].[sp_syscollector_purge_collection_logs]     </frame>
    </executionStack>
    <inputbuf>

            EXEC [dbo].[sp_syscollector_purge_collection_logs]
                </inputbuf>
   </process>
   <process id="process5fc0b0508" taskpriority="0" logused="10000" waittime="2896" schedulerid="2" kpid="8248" status="suspended" spid="204" sbid="0" ecid="5" priority="0" trancount="0" lastbatchstarted="2014-06-27T03:15:00.620" lastbatchcompleted="2014-06-27T03:15:00.620" clientapp="SQLAgent - TSQL JobStep (Job 0xF408BE64B781AB40B05180C1B2EB9DC8 : Step 1)" hostname="REPORTESCRYSTAL" hostpid="28848" isolationlevel="read committed (2)" xactid="1282041063" currentdb="4" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
    <executionStack>
     <frame procname="msdb.dbo.sp_syscollector_purge_collection_logs" line="31" stmtstart="2152" stmtend="2720" sqlhandle="0x03000400af551732fc7ab30044a000000100000000000000">
DELETE FROM dbo.sysssislog
        FROM dbo.sysssislog AS s
        INNER JOIN dbo.syscollector_execution_log_internal AS l ON (l.package_execution_id = s.executionid)
        INNER JOIN #purged_log_ids AS i ON i.log_id = l.log_id

    -- Then delete the actual logs     </frame>
     <frame procname="adhoc" line="2" stmtstart="28" sqlhandle="0x0100040034b4980c80c0a260030000000000000000000000">
EXEC [dbo].[sp_syscollector_purge_collection_logs]     </frame>
    </executionStack>
    <inputbuf>
    </inputbuf>
   </process>
  </process-list>
  <resource-list>
   <pagelock fileid="1" pageid="98644" dbid="4" objectname="msdb.dbo.sysssislog" id="lock544471d00" mode="U" associatedObjectId="72057594047037440">
    <owner-list>
     <owner id="process9631948" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="process45f9fb048" mode="U" requestType="wait"/>
    </waiter-list>
   </pagelock>
   <pagelock fileid="1" pageid="98644" dbid="4" objectname="msdb.dbo.sysssislog" id="lock544471d00" mode="U" associatedObjectId="72057594047037440">
    <owner-list/>
    <waiter-list>
     <waiter id="process961c988" mode="U" requestType="wait"/>
    </waiter-list>
   </pagelock>
   <pagelock fileid="1" pageid="108775" dbid="4" objectname="msdb.dbo.sysssislog" id="lock6a3143280" mode="U" associatedObjectId="72057594047037440">
    <owner-list>
     <owner id="process964fdc8" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="processb543dc8" mode="U" requestType="wait"/>
    </waiter-list>
   </pagelock>
   <pagelock fileid="1" pageid="108775" dbid="4" objectname="msdb.dbo.sysssislog" id="lock6a3143280" mode="U" associatedObjectId="72057594047037440">
    <owner-list/>
    <waiter-list>
     <waiter id="process9631948" mode="U" requestType="wait"/>
    </waiter-list>
   </pagelock>
   <exchangeEvent id="Port80185c00" WaitType="e_waitPortOpen" nodeId="2">
    <owner-list>
     <owner id="process5fc0b0508"/>
    </owner-list>
    <waiter-list>
     <waiter id="process964fdc8"/>
    </waiter-list>
   </exchangeEvent>
   <exchangeEvent id="Pipe1937d8680" WaitType="e_waitPipeGetRow" nodeId="4">
    <owner-list>
     <owner id="process45f9fb048"/>
     <owner id="process961c988"/>
    </owner-list>
    <waiter-list>
     <waiter id="process5fc0b0508"/>
    </waiter-list>
   </exchangeEvent>
  </resource-list>
 </deadlock>
</deadlock-list>

EDIT 2

I've already replicated the deadlock in another SQL Instance, here is the graph

<deadlock victim="process9ada4508">
  <process-list>
   <process id="process9ada4508" taskpriority="0" logused="0" waitresource="PAGE: 7:1:595136" waittime="2411" ownerId="391631459" transactionname="DELETE" lasttranstarted="2014-07-08T10:51:37.907" XDES="0x9a5c9b00" lockMode="U" schedulerid="2" kpid="7308" status="suspended" spid="93" sbid="0" ecid="2" priority="0" trancount="0" lastbatchstarted="2014-07-08T10:51:37.907" lastbatchcompleted="2014-07-08T10:51:37.907" clientapp=".Net SqlClient Data Provider" hostname="CENTROCONTROL" hostpid="9016" isolationlevel="read committed (2)" xactid="391631459" currentdb="7" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="adhoc" line="1" stmtstart="54" sqlhandle="0x020000000826022c8d837edd59a697e6b8f2b323fc72898d">
DELETE FROM TBL_ITAS_USER_REGISTRY WHERE (URE_DATE_IN &gt;= @p0 AND URE_DATE_IN &lt; @p1) OR (URE_DATE_OUT &gt;= @p0 AND URE_DATE_OUT &lt; @p1)     </frame>
     <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown     </frame>
    </executionStack>
    <inputbuf>
    </inputbuf>
   </process>
   <process id="process43fdc8" taskpriority="0" logused="0" waitresource="PAGE: 7:1:595136" waittime="5046" ownerId="391632197" transactionname="DELETE" lasttranstarted="2014-07-08T10:51:39.943" XDES="0x2388fc080" lockMode="U" schedulerid="1" kpid="7944" status="suspended" spid="84" sbid="0" ecid="3" priority="0" trancount="0" lastbatchstarted="2014-07-08T10:51:39.943" lastbatchcompleted="2014-07-08T10:51:39.943" clientapp=".Net SqlClient Data Provider" hostname="CENTROCONTROL" hostpid="9016" isolationlevel="read committed (2)" xactid="391632197" currentdb="7" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="adhoc" line="1" stmtstart="54" sqlhandle="0x020000000826022c8d837edd59a697e6b8f2b323fc72898d">
DELETE FROM TBL_ITAS_USER_REGISTRY WHERE (URE_DATE_IN &gt;= @p0 AND URE_DATE_IN &lt; @p1) OR (URE_DATE_OUT &gt;= @p0 AND URE_DATE_OUT &lt; @p1)     </frame>
     <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown     </frame>
    </executionStack>
    <inputbuf>
    </inputbuf>
   </process>
   <process id="process9ada5288" taskpriority="0" logused="10000" waittime="1946" schedulerid="2" kpid="4680" status="suspended" spid="93" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-07-08T10:51:37.907" lastbatchcompleted="2014-07-08T10:51:37.907" clientapp=".Net SqlClient Data Provider" hostname="CENTROCONTROL" hostpid="9016" loginname="Robotec" isolationlevel="read committed (2)" xactid="391631459" currentdb="7" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
    <executionStack>
     <frame procname="adhoc" line="1" stmtstart="54" sqlhandle="0x020000000826022c8d837edd59a697e6b8f2b323fc72898d">
DELETE FROM TBL_ITAS_USER_REGISTRY WHERE (URE_DATE_IN &gt;= @p0 AND URE_DATE_IN &lt; @p1) OR (URE_DATE_OUT &gt;= @p0 AND URE_DATE_OUT &lt; @p1)     </frame>
     <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown     </frame>
    </executionStack>
    <inputbuf>
(@p0 datetime,@p1 datetime)DELETE FROM TBL_ITAS_USER_REGISTRY WHERE (URE_DATE_IN &gt;= @p0 AND URE_DATE_IN &lt; @p1) OR (URE_DATE_OUT &gt;= @p0 AND URE_DATE_OUT &lt; @p1)    </inputbuf>
   </process>
  </process-list>
  <resource-list>
   <pagelock fileid="1" pageid="595136" dbid="7" objectname="MFEnterprise_v2.dbo.TBL_ITAS_USER_REGISTRY" id="lock800f1200" mode="U" associatedObjectId="72057594063224832">
    <owner-list/>
    <waiter-list>
     <waiter id="process9ada4508" mode="U" requestType="wait"/>
    </waiter-list>
   </pagelock>
   <pagelock fileid="1" pageid="595136" dbid="7" objectname="MFEnterprise_v2.dbo.TBL_ITAS_USER_REGISTRY" id="lock800f1200" mode="U" associatedObjectId="72057594063224832">
    <owner-list>
     <owner id="process9ada5288" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="process43fdc8" mode="U" requestType="wait"/>
    </waiter-list>
   </pagelock>
   <exchangeEvent id="Pipe188f61be0" WaitType="e_waitPipeGetRow" nodeId="2">
    <owner-list>
     <owner id="process9ada4508"/>
    </owner-list>
    <waiter-list>
     <waiter id="process9ada5288"/>
    </waiter-list>
   </exchangeEvent>
  </resource-list>
 </deadlock>
</deadlock-list>

Best Answer

Based on what I see in SQL Sentry Plan Explorer PRO*, looks like two different jobs are fighting over who is going to get to delete some rows (click to enlarge):

click to enlarge

Maybe there is some overlap because Microsoft does some not-so-optimal things in msdb.dbo.sp_syscollector_purge_collection_logs, for instance letting each invocation of the procedure grab TOP (@delete_batch_size) rows with no ORDER BY or exclusion techniques, meaning two people calling the stored procedure could certainly end up trying to delete rows on the same page, or maybe even the same rows.

So the easiest solution:

Don't have two jobs running the same cleanup procedure.

If you need to run both jobs (again, please explain why?), my first thought would be to set MAXDOP to 1 for the delete queries in that stored procedure (yes, you can modify it, just keep that in source control because your change could be undone by service packs, upgrades, etc).

DELETE ... OPTION (MAXDOP 1);

I don't know for sure that this is the cause; would be interesting to see an actual plan from calling the procedure manually outside the context of the job.

If you don't want to modify the stored procedure, you have at least one other option if you are running Enterprise Edition: Resource Governor. Just have the login the job executes as belong to a workload group with max_dop set to 1. I have instructions for doing so (framed in the context of maintenance jobs, but same concept) in my white paper, Using the Resource Governor.

* Disclaimer: I work for SQL Sentry.