Sql-server – Why would a foreign key from a table to itself cause a deadlock when running two deletes

csql server

I have a database with a table "SelfRef". SelfRef has two fields:

Id (guid, PK, not null)
SelfRefId (guid, nullable)

There is a foreign key constraint that maps the SelfRefId field back to the Id field.

I have an EntityFrameworkCore project that references the database. I am running the following test:

  1. Create two entries in the SelfRef table. In each case, the SelfRefId is null. Save changes.
  2. Delete both entries in separate, more-or-less simultaneous, tasks.

I am finding that step 2 often causes a deadlock. I don't understand why it should.

I'm showing my code below, though I doubt the issue is specific to this code:

public class TestSelfRefDeadlock
{
    private async Task CreateSelfRef_ThenDelete_Deletes() {
        var sr = new SelfRef
        {
            Id = Guid.NewGuid(),
            Name = "SR"
        };
        var factory = new SelfRefDbFactory();
        using (var db = factory.Create()) {
            db.Add(sr);
            await db.SaveChangesAsync();  // EDIT: Changing this to db.SaveChanges() appears to fix the problem, at least in this test scenario.
        }
        using (var db = factory.Create()) {
            db.SelfRef.Remove(sr);
            await db.SaveChangesAsync();
        }
    }

    private IEnumerable<Task> DeadlockTasks() {
        for (int i=0; i<2; i++) {
            yield return CreateSelfRef_ThenDelete_Deletes();
        }
    }

    [Fact]
    public async Task LotsaDeletes_DoNotDeadlock()
        => await Task.WhenAll(DeadlockTasks());
}

EDIT: I have confirmed that the same deadlock happens in EF6.

To create the table in my database:

USE [SelfReferential]
GO

/****** Object:  Table [dbo].[SelfRef]    Script Date: 3/20/2018 3:43:50 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[SelfRef](
    [Id] [uniqueidentifier] NOT NULL,
    [SelfReferentialId] [uniqueidentifier] NULL,
    [Name] [nchar](10) NULL,
 CONSTRAINT [PK_SelfRef] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[SelfRef]  WITH CHECK ADD  CONSTRAINT [FK_SelfRef_SelfRef] FOREIGN KEY([SelfReferentialId])
REFERENCES [dbo].[SelfRef] ([Id])
GO

ALTER TABLE [dbo].[SelfRef] CHECK CONSTRAINT [FK_SelfRef_SelfRef]
GO

To generate the entities:

Scaffold-DbContext "Server=localhost;Database=SelfReferential;Trusted_Connection=True;" Microsoft.EntityFrameworkCore.SqlServer -Context SelfRefDb -OutputDir Entities -Force 

The DbFactory class:

public class SelfRefDbFactory : IFactory<SelfRefDb>
{
    private const string str1 = @"Data Source=MyPcName;Initial Catalog=SelfReferential;Integrated Security=True;ApplicationIntent=ReadWrite;";
    private const string str2 = @"Data Source=MyPcName;Initial Catalog=SelfReferential;Integrated Security=True;ApplicationIntent=ReadWrite;MultipleActiveResultSets=True";
    public SelfRefDb Create() {
        var options = new DbContextOptionsBuilder<SelfRefDb>()
            .UseSqlServer(str1).Options;
        return new SelfRefDb(options);
    }
}

The error message:

Message: System.InvalidOperationException : An exception has been raised that is likely due to a transient failure. If you are connecting to a SQL Azure database consider using SqlAzureExecutionStrategy.
---- Microsoft.EntityFrameworkCore.DbUpdateException : An error occurred while updating the entries. See the inner exception for details.
-------- System.Data.SqlClient.SqlException : Transaction (Process ID 58) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Some Sql events from the profiler are below. I am skipping over numerous "Audit Login" and "Audit Logout" events, and the fields are copied one-by-one. There must be a better way to extract things, but I don't know what it is.

exec sp_executesql N'SET NOCOUNT ON;
INSERT INTO [SelfRef] ([Id], [Name], [SelfReferentialId])
VALUES (@p0, @p1, @p2);
',N'@p0 uniqueidentifier,@p1 nvarchar(10),@p2 
uniqueidentifier',@p0='93671E2E-28E5-414D-A3DB-239FA433640C',@p1=N'SR',@p2=NULL

This particular run was with two threads. After two events like the above, I saw two of:

exec sp_reset_connection 

then two like this:

exec sp_executesql N'SET NOCOUNT ON;
DELETE FROM [SelfRef]
WHERE [Id] = @p0;
SELECT @@ROWCOUNT;

',N'@p0 uniqueidentifier',@p0='F5B53458-08C5-485E-8364-2A2842E95158'

Two more connection resets, then it was done.

Deadlock xml:

<deadlock-list>
    <deadlock victim="process1fe3db6b468">
        <process-list>
            <process id="process1fe3db6b468" taskpriority="0" logused="300" waitresource="KEY: 14:72057594041401344 (427c492d0b23)" waittime="147" ownerId="218910" transactionname="user_transaction" lasttranstarted="2018-03-22T14:33:57.880" XDES="0x2021f8bc408" lockMode="S" schedulerid="2" kpid="8540" status="suspended" spid="53" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2018-03-22T14:33:57.883" lastbatchcompleted="2018-03-22T14:33:57.880" lastattention="1900-01-01T00:00:00.880" clientapp=".Net SqlClient Data Provider" hostname="WILLIAMASUS" hostpid="14656" loginname="MicrosoftAccount\jockusch@gmail.com" isolationlevel="read committed (2)" xactid="218910" currentdb="14" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
                <executionStack>
                    <frame procname="adhoc" line="2" stmtstart="78" stmtend="154" sqlhandle="0x0200000087849c297464e5637211740e8fde989bf9ffc37a0000000000000000000000000000000000000000">
unknown     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@p0 uniqueidentifier)SET NOCOUNT ON;
DELETE FROM [SelfRef]
WHERE [Id] = @p0;
SELECT @@ROWCOUNT;

    </inputbuf>
            </process>
            <process id="process1fe3db684e8" taskpriority="0" logused="300" waitresource="KEY: 14:72057594041401344 (8e30f77e2707)" waittime="146" ownerId="218908" transactionname="user_transaction" lasttranstarted="2018-03-22T14:33:57.880" XDES="0x20227f6b458" lockMode="S" schedulerid="1" kpid="8300" status="suspended" spid="54" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2018-03-22T14:33:57.883" lastbatchcompleted="2018-03-22T14:33:57.880" lastattention="1900-01-01T00:00:00.880" clientapp=".Net SqlClient Data Provider" hostname="WILLIAMASUS" hostpid="14656" loginname="MicrosoftAccount\jockusch@gmail.com" isolationlevel="read committed (2)" xactid="218908" currentdb="14" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
                <executionStack>
                    <frame procname="adhoc" line="2" stmtstart="78" stmtend="154" sqlhandle="0x0200000087849c297464e5637211740e8fde989bf9ffc37a0000000000000000000000000000000000000000">
unknown     </frame>
                    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown     </frame>
                </executionStack>
                <inputbuf>
(@p0 uniqueidentifier)SET NOCOUNT ON;
DELETE FROM [SelfRef]
WHERE [Id] = @p0;
SELECT @@ROWCOUNT;

    </inputbuf>
            </process>
        </process-list>
        <resource-list>
            <keylock hobtid="72057594041401344" dbid="14" objectname="SelfReferential.dbo.SelfRef" indexname="PK_SelfRef" id="lock20229074e00" mode="X" associatedObjectId="72057594041401344">
                <owner-list>
                    <owner id="process1fe3db684e8" mode="X"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process1fe3db6b468" mode="S" requestType="wait"/>
                </waiter-list>
            </keylock>
            <keylock hobtid="72057594041401344" dbid="14" objectname="SelfReferential.dbo.SelfRef" indexname="PK_SelfRef" id="lock20229073c80" mode="X" associatedObjectId="72057594041401344">
                <owner-list>
                    <owner id="process1fe3db6b468" mode="X"/>
                </owner-list>
                <waiter-list>
                    <waiter id="process1fe3db684e8" mode="S" requestType="wait"/>
                </waiter-list>
            </keylock>
        </resource-list>
    </deadlock>
</deadlock-list>

Best Answer

The deadlock XML indicates that the two sessions are fighting over two different rows, each session having an e(X)clusive lock on one and requesting a (S)hared lock on the other.

Given:

  1. Deleting a PK requires that it first ensure that there are no existing FK references to it
  2. The Transaction Isolation Level is "Read Committed"
  3. You are using connection pooling
  4. The two competing sessions are coming from the same application (as per the hostpid value)
  5. It seems that doing Step 1 as async allows for the problem while not saving async doesn't
  6. There is no indication that transactions are being used, though EF could be doing that behind the scenes, especially since the deadlock XML shows trancount="2" for both sessions

it is possible that either:

  1. using the async option on the save changes the timing and/or whether or not the app layer is starting a transaction, or
  2. connection pooling is allowing the threads to swap which connection/session they were using

Now, the connection pooling stuff (number 2) generally shouldn't cause any issues, but since there are scenarios where it can (such as if Distributed Transactions are being used), I didn't want to rule it out. And, since I do not know how EF and/or the async option handles things, it could very well be a combination of async and connection pooling.

So, why don't you first try keeping the async save for Step 1 but disable connection pooling by adding Pooling=false; to your connection string.

Of course, whether or not disabling connection pooling helps, given that not using async on the save solves the issue (or at least appears to thus far), you should consider not using async when creating items. Maybe only use that for deletes and selects? Even if we determine the exact change in behavior between using and not using async on Step 1, it might not be anything that can be worked around (or at least not worked around without doing things that probably shouldn't be done).