Azure SQL – How to Get Deadlock Information

azure-sql-databasedeadlocksql server

My application has been logging dead lock exceptions return from sql azure to my .net application: Transaction (Process ID 152) was deadlocked on lock resources with another process....

I would like to investigate this further to see if I can prevent those deadlocks from happening.

How do I get the details about those past and future deadlocks? I remember that the sql server profiler is able to extract some deadlock events with some xml that shows all the queries involved in the dead lock. That is what I'm after.

However, since this runs on Sql Azure, there is no profiler.

Best Answer

You can use extended events on Azure to capture deadlock details.

Following article explains in details.

https://blogs.msdn.microsoft.com/azuresqldbsupport/2017/01/21/lesson-learned-19-how-to-obtain-the-deadlocks-of-your-azure-sql-database/

Based on the comment I am adding this section. I just tested this and it works.

You need to run the following query in Master database (Not user database where your deadlock happened) in order to get the XML of the deadlock.

WITH deadlockcte 
     AS (SELECT Cast(event_data AS XML) AS [target_data_XML] 
         FROM   sys.Fn_xe_telemetry_blob_target_read_file('dl', NULL, NULL, NULL 
                )) 
SELECT 
target_data_xml.value('(/event/@timestamp)[1]', 'DateTime2') 
AS Timestamp, 
target_data_xml.query('/event/data[@name="xml_report"]/value/deadlock') 
AS deadlock_xml, 
target_data_xml.query('/event/data[@name="database_name"]/value').value('(/value)[1]', 'nvarchar(100)') AS db_name 
FROM   deadlockcte

Which will have the details about the deadlock processes and you can get a deadlock graph by:

Copy the deadlock_xml column results from the previous query and load into a text file. If more than one row is returned, you will want to do each row result separate.
Save the file as a ‘.xdl’ extension, (e.g. deadlock.xdl) which can be viewed in tools such as SQL Server Management Studio as a deadlock report/graphic.

Based on the last comment I am adding my test code which does show text in inputbuffer.

 /*

Deadlock simulation code was copied from:
http://stackoverflow.com/questions/22825147/how-to-simulate-deadlock-on-sql-server

*/

--Simulating deadlock

IF OBJECT_ID('Employees') IS NOT NULL
    DROP TABLE Employees

CREATE TABLE Employees (
    EmpId INT IDENTITY,
    EmpName VARCHAR(16),
    Phone VARCHAR(16)
)
GO



IF OBJECT_ID('Suppliers') IS NOT NULL
    DROP TABLE Suppliers

CREATE TABLE Suppliers(
    SupplierId INT IDENTITY,
    SupplierName VARCHAR(64),
    Fax VARCHAR(16)
)
GO


INSERT INTO Employees (EmpName, Phone)
VALUES ('Martha', '800-555-1212'), ('Jimmy', '619-555-8080')
GO
INSERT INTO Suppliers (SupplierName, Fax)
VALUES ('Acme', '877-555-6060'), ('Rockwell', '800-257-1234')
GO

/*
--Window 1
BEGIN TRAN; 
UPDATE Employees
SET EmpName = 'Mary'
WHERE empid = 1


--Window=2
BEGIN TRAN; 
UPDATE Suppliers
SET Fax = N'555-1212'
WHERE supplierid = 1

Window 1
UPDATE Suppliers
SET Fax = N'555-1212'
WHERE supplierid = 1

window 2
UPDATE Employees
SET phone = N'555-9999'
WHERE empid = 1
--commit
*/


DROP TABLE Employees
GO
DROP TABLE Suppliers
GO

Related Solutions

Sql-server – Lock escalation problem on a trigger

A create script for tableA including all indexes and the execution plans for the two UPDATE statements involved are required for a definitive solution. That being said, we can still combine the deadlock graph provided with knowledge of how SQL Server performs updates and have a very good chance of resolving this issue.

This deadlock involves two spids performing UPDATES on tableA, 58 & 59.

Spid 58's Query:

Update b set  syd_id=d.syd_id    
        from tableA b    
        inner join tableC d with (nolock) on b.syd_pers_id=d.syd_pers_id    
        --and b.game='es_lotto'    
        and b.game=d.game  and isnull(b.syd_id,0)=0

Spid 59's Query:

Update b set Draw_Date=d.draw_date    
        from tableA b
        inner join tableB d with (nolock) on b.draw_no=d.draw_no    
        and left(b.game,2)='UK' and b.Draw_Date is null

In this deadlock, there are two resources involved:

a "Key Lock" on the index IX_tableA_Draw_Date.
- Owned by spid 58, mode = 'X' (Exclusive)
- Requested by spid 59, mode = 'U' (Update)
a "Page" lock for a different, unspecified index on tableA
- Owned by spid 59, mode = 'IX' (Intent Exclusive)
- Requsted by spid 58, mode = 'U' (Update)

There are also 3 different lock ownership modes in use (U, X, IX):

"U" or Update Lock. Conceptually, the sql plan for an UPDATE statement has two parts: A "select" part in which the data to modify is identified, and a "modification" part where the modification actually happens. Update locks are taken during the "select" part to protect the rows from modification until we get around to actually modifying them (upon which time an X lock is acquired). Update locks are compatible with "S" (Shared) locks, so readers don't get blocked, however, they are incompatible with X, IX and other U locks.
"X" or Exclusive Lock. Exclusive locks are used to protect data being modified during inserts, updates and deletes. They are incompatible with all other lock types, and ensure that only one writer can access a resource at the same time.
"IX" or Intent Exclusive Lock. Intent Exclusive locks are taken on higher level resources before lower level resources get X locks. For example, before an X lock on a row is taken, an IX lock is taken on both the associated page and the entire table. IX locks indicate that X locks exist at a lower level. They can also be changed into X locks as part of lock escalation. IX locks are incompatible with X locks, so if two transactions have IX locks on a table, then neither transaction can escalate their page or row locks to the table level. Intent locks (including IX) are an optimization that allow SQL to check one higher level lock instead of checking thousands of lower level locks.

Now, let's examine spid 58's UPDATE query in detail. We know from the deadlock graph that spid 58 has an exclusive lock on a row in IX_tableA_Draw_Date. Therefore, we know that syd_id is either in the index key, in the clustered index key, or is an included column (I'm going to ignore the possibility that spid 58 already owned the lock from a previous statement in the trigger).

Spid 58 is also attempting to acquire an update lock on a page in a different index. This tells two things:

We know that the select portion of the update is still reading and locking data after the modification portion has started. This implies that there are no blocking operators affecting the rows from tableA in the plan. For example, if rows were read from tableA, sorted, and used in a MERGE join; then all the rows would be read (and U-locked) before any exclusive locks are taken. That isn't the case.
We're attempting to take an Update lock on a PAGE, not a ROW, so it's likely that we're doing an index scan.

We can also assume that a relatively small number of rows are expected to be updated (Due to the multi-column join combined with updating nulls or 0's in an insert trigger, this doesn't feel like a query that updates a large percentage of rows). Taking all this into consideration, it is likely that the UPDATE is reading rows from tableA using a scan on our mystery index (don't know if it's clustered or non-clustered ) and performing a LOOP JOIN against tableC to get the tableC.syd_id.

Performing a similiar analysis for spid 59, it is likely that we're seeking into IX_tableA_Draw_Date where "Draw_Date is null", reading (and taking update locks on ROWS), LOOP JOIN against tableB, then taking a X ROW lock and updating our "mystery" index (the IX lock on the page implies that there are X locks on rows).

Where are we? And what do we do next?
We know we've got two update statements. Both are reading and writing to the same pair of indexes (with the reading and writing swapped). Both are waiting for an update lock after they've already updated some rows. Spid 58 is reading pages, 59 is reading rows. Both are writing and locking rows. Lock escalation isn't a factor, as row locks -> table locks ( not page locks).

If Spid 58 was taking row locks instead of page locks on the mystery index, then you would only have deadlocks if both queries updated the same row. You could use a ROWLOCK query hint to make that happen, however you run the risk of increased lock escalation to table locks, which could be worse.

Another approach is to separate the update into two parts.

select id, --Not sure what the PK is...
    syd_id
into #tmp 
from tableA inner join tableB (nolock) on b.syd_pers_id=d.syd_pers_id    
    --and b.game='es_lotto'    
    and b.game=d.game  and isnull(b.syd_id,0)=0

UPDATE tableA SET syd_id = t.syd_id from tableA d inner join #tmp t on t.id = d.id

This would change the order of locks taken as the non-clustered indexes will only be locked for writes, not reads.

Recommendation

Deadlocks are a natural by-product of RDBMS's. Even if you do everything "right" you can't always eliminate them. At 3-4 Deadlocks per day, the correct approach is probably to just wrap the statement in some error handling so that it is re-tried if a deadlock occurs.

http://msdn.microsoft.com/en-us/library/aa175791(v=sql.80).aspx has an example for SQL 2005. However, since this is in a trigger, you may need to add the error handling outside of the INSERT statement that fires the trigger.

Sql-server – Understanding SIX lock in Microsoft SQL-Server

To answer that I have to take a little detour, so bear with me.

If two sessions take a lock on the same resource SQL Server checks the lock compatibility map and if the second request is not "compatible" with the first, the second session has to wait. There are three lock types "S"hared, "U"pdate and e"X"clusive. S locks are taken to read from a resource and X locks are taken to write to a resource. S locks are compatible with each other, X locks are not compatible with anything else. U locks are a hybrid that is used in some cases for deadlock prevention.

Now, SQL Server can take locks on several levels:Table, Partition, Page and Row. So if session one takes a table lock and session two takes a non-compatible lock on one row of the table, those two locks are not on the same resource and SQL Server won't detect the collision. To protect against that, SQL Server always starts to take a lock on the table level and works its way down the hierarchy. Now the point of page and row locks is higher concurrency, so if one session wants to write to one row and another session wants to write to another row, they should not block each other. If a session in addition to taking a lock on a row also has to take the same lock on the table, that advantage is gone. So instead of taking an exclusive lock (X) on the table, the session requests an intend-exclusive lock (IX). This lock is compatible with other intend locks but not with other "real" locks. So another session can take an intend-exclusive lock on the same table as well. The intend-exclusive lock says, that the session intends to take an exclusive lock on a lower level resource. The same happens on the page level, if the intended lock is a row lock, so after all is done, the session has an IX lock on the table and on one of the pages and an X lock on one of the rows in that page. This also means, that you will never find an intend lock on a row as rows are the lowest level in the lock hierarchy.

In some circumstances a session holds an S lock on the table or a page. If the session now (within the same transaction) requests an X lock on a row in that same table, it first has to take an IX lock on the table/page. However, a session can hold only one lock on any given resource. So to take the IX lock, it would have to release the S lock wich is probably not desired, so SQL Server offers a combination: SIX.

The reason why you have a page lock is due to SQL Server sometimes deciding that it would be better to lock the page instead of locking each row. That happens often if there are very many locks taken between al sessions already, but can have many other reasons too.

So far the theory.

Now in your case the SIX lock is held by a three table join select query. A select never takes any type of lock that is not a shared lock unless you explicitly tell it to (e.g. with a XLOCK hint). Such a hint is not visible within the input buffer, so I assume the IX part is a left over from the last batch on this connection. If you are using connection pooling and forget to cleanup all open transactions, such a lock can live potentially forever. But it becomes also very hard to troubleshoot.

You could start by running an XEvent session that pairs OPEN TRANs with COMMITs and see if you can find the culprit that way.

Best Answer

Related Solutions

Sql-server – Lock escalation problem on a trigger

Recommendation

Sql-server – Understanding SIX lock in Microsoft SQL-Server

Related Question