Sql-server – MERGE deadlocking prevention

deadlockexecution-planmergesql server

In one of our databases we have a table that is intensively concurrently accessed by multiple threads. Threads do update or insert rows via MERGE. There are also threads that delete rows on occasion, so table data is very volatile. Threads doing upserts suffer from deadlocking sometimes. The issue looks similar to one described in this question. The difference, though, is that in our case each thread do update or insert exactly one row.

Simplified setup is following. The table is heap with two unique nonclustered indexes over

CREATE TABLE [Cache]
(
    [UID] uniqueidentifier NOT NULL CONSTRAINT DF_Cache_UID DEFAULT (newid()),
    [ItemKey] varchar(200) NOT NULL,
    [FileName] nvarchar(255) NOT NULL,
    [Expires] datetime2(2) NOT NULL,
    CONSTRAINT [PK_Cache] PRIMARY KEY NONCLUSTERED ([UID])
)
GO
CREATE UNIQUE INDEX IX_Cache ON [Cache] ([ItemKey]);
GO

and the typical query is

DECLARE
    @itemKey varchar(200) = 'Item_0F3C43A6A6A14255B2EA977EA730EDF2',
    @fileName nvarchar(255) = 'File_0F3C43A6A6A14255B2EA977EA730EDF2.dat';

MERGE INTO [Cache] WITH (HOLDLOCK) T
USING (
    VALUES (@itemKey, @fileName, dateadd(minute, 10, sysdatetime()))
) S(ItemKey, FileName, Expires)
ON T.ItemKey = S.ItemKey
WHEN MATCHED THEN
    UPDATE
    SET
        T.FileName = S.FileName,
        T.Expires = S.Expires
WHEN NOT MATCHED THEN
    INSERT (ItemKey, FileName, Expires)
    VALUES (S.ItemKey, S.FileName, S.Expires)
OUTPUT deleted.FileName;

i.e., matching happens by unique index key. Hint HOLDLOCK is here, because of concurrency (as advised here).

I did small investigation and the following is what I found.

In most of the cases query execution plan is

with the following locking pattern

i.e. IX lock on the object followed by more granular locks.

Sometimes, however, query execution plan is different

(this plan shape can be forced by adding INDEX(0) hint) and its locking pattern is

notice X lock placed on object after IX is placed already.

Since two IX are compatible, but two X are not, the thing that happens under concurrency is

deadlock!

And here the first part of the question arises. Is placing X lock on object after IX eligible? Isn't it bug?

Documentation states:

Intent locks are named intent locks because they are acquired before a
lock at the lower level, and therefore signal intent to place locks at
a lower level.

and also

IX means the intention to update only some of the rows rather than
all of them

so, placing X lock on object after IX looks VERY suspicious to me.

First I attempted to prevent deadlocking by trying to add table locking hints

MERGE INTO [Cache] WITH (HOLDLOCK, TABLOCK) T

and

MERGE INTO [Cache] WITH (HOLDLOCK, TABLOCKX) T

with the TABLOCK in place locking pattern becomes

and with the TABLOCKX locking pattern is

since two SIX (as well as two X) are not compatible this prevents deadlock effectively, but, unfortunately, prevents concurrency as well (which is not desired).

My next attempts were adding PAGLOCK and ROWLOCK to make locks more granular and reduce contention. Both has no effect (X on object was still observed immediately after IX).

My final attempt was forcing "good" execution plan shape with good granular locking by adding FORCESEEK hint

MERGE INTO [Cache] WITH (HOLDLOCK, FORCESEEK(IX_Cache(ItemKey))) T

and it worked.

And here the second part of the question arises. Could it happen that FORCESEEK will be ignored and bad locking pattern will be used? (As I mentioned, PAGLOCK and ROWLOCK were ignored seemingly).

Adding UPDLOCK has no effect (X on object still observable after IX).

Making IX_Cache index clustered, as anticipated, worked. It led to plan with Clustered Index Seek and granular locking. Additionally I tried forcing Clustered Index Scan that shown granular locking also.

However. Additional observation. In the original setup even with FORCESEEK(IX_Cache(ItemKey))) in place, if one change @itemKey variable declaration from varchar(200) to nvarchar(200), execution plan becomes

see that seek is used, BUT locking pattern in this case again shows X lock placed on object after IX.

So, it seems that forcing seek not necessarily guarantee granular locks (and deadlocks absence hence). I'm not confident, that having clustered index guarantee granular locking. Or does it?

My understanding (correct me if I'm wrong) is that locking is situational in a great degree, and certain execution plan shape does not implies certain locking pattern.

The question about eligibility of placing X lock on object after IX still open. And if it is eligible, is there something that one can do to prevent object locking?

Best Answer

Is placing IX followed by X on the object eligible? Is it bug or not?

It looks a bit odd, but it is valid. At the time the IX is taken, the intention may well be to take X locks at a lower level. There's nothing to say that such locks must actually be taken. After all, there might not be anything to lock at the lower level; the engine cannot know that ahead of time. In addition, there may be optimizations such that lower level locks can be skipped (an example for IS and S locks can be seen here).

More specifically for the current scenario, it is true that serializable key-range locks are not available for a heap, so the only alternative is an X lock at the object level. In that sense, the engine might be able to detect early that an X lock will inevitably be required if the access method is a heap scan, and so avoid taking the IX lock.

On the other hand, locking is complex, and intent locks may sometimes be taken for internal reasons not necessarily related to the intention to take lower level locks. Taking IX may be the least invasive way of providing a required protection for some obscure edge case. For a similar sort of consideration, see Shared Lock issued on IsolationLevel.ReadUncommitted.

So, the current situation is unfortunate for your deadlock scenario, and it may be avoidable in principle, but that's not necessarily the same as being a 'bug'. You could report the issue via your normal support channel, or on Microsoft Connect, if you need a definitive answer on this.

Could it happen that FORCESEEK will be ignored and bad locking pattern will be used?

No. FORCESEEK is less of a hint and more of a directive. If the optimizer cannot find a plan that honours the 'hint', it will produce an error.

Forcing the index is a way of ensuring that key-range locks can be taken. Together with the update locks naturally taken when processing an access method for rows to change, this provides enough of a guarantee to avoid concurrency issues in your scenario.

If the schema of the table does not change (e.g. adding a new index), the hint is also enough to avoid this query deadlocking with itself. There is still a possibility of a cyclic deadlock with other queries that might access the heap before the nonclustered index (such as an update to the key of the nonclustered index).

...variable declaration from varchar(200) to nvarchar(200)...

This breaks the guarantee that a single row will be affected, so an Eager Table Spool is introduced for Halloween protection. As a further workaround for this, make the guarantee explicit with MERGE TOP (1) INTO [Cache]....

My understanding [...] is that locking is situational in a great degree, and certain execution plan shape does not implies certain locking pattern.

There is certainly much more going on that is visible in an execution plan. You can force a certain plan shape with e.g. a plan guide, but the engine may still decide to take different locks at runtime. The chances are fairly low if you incorporate the TOP (1) element above.

General remarks

It is somewhat unusual to see a heap table being used in this way. You should consider the merits of converting it to a clustered table, perhaps using the index Dan Guzman suggested in a comment:

CREATE UNIQUE CLUSTERED INDEX IX_Cache ON [Cache] ([ItemKey]);

This may have important space reuse advantages, as well as providing a good workaround for the current deadlocking issue.

MERGE is also slightly unusual to see in a high-concurrency environment. Somewhat counter-intuitively, it is often more efficient to perform separate INSERT and UPDATE statements, for example:

DECLARE
    @itemKey varchar(200) = 'Item_0F3C43A6A6A14255B2EA977EA730EDF2',
    @fileName nvarchar(255) = 'File_0F3C43A6A6A14255B2EA977EA730EDF2.dat';

BEGIN TRANSACTION;

    DECLARE @expires datetime2(2) = DATEADD(MINUTE, 10, SYSDATETIME());

    UPDATE TOP (1) dbo.Cache WITH (SERIALIZABLE, UPDLOCK)
    SET [FileName] = @fileName,
        Expires = @expires
    OUTPUT Deleted.[FileName]
    WHERE
        ItemKey = @itemKey;

    IF @@ROWCOUNT = 0
        INSERT dbo.Cache
            (ItemKey, [FileName], Expires)
        VALUES
            (@itemKey, @fileName, @expires);

COMMIT TRANSACTION;

Note how the RID lookup is no longer necessary:

If you can guarantee the existence of a unique index on ItemKey (as in the question) the redundant TOP (1) in the UPDATE can be removed, giving the simpler plan:

Both INSERT and UPDATE plans qualify for a trivial plan in either case. MERGE always requires full cost-based optimization.

See the related Q & A SQL Server 2014 Concurrent input issue for the correct pattern to use, and more information about MERGE.

Deadlocks cannot always be prevented. They can be reduced to a minimum with careful coding and design, but the application should always be prepared to handle the odd deadlock gracefully (e.g. recheck conditions then retry).

If you have complete control over the processes that access the object in question, you might also consider using application locks to serialize access to individual elements, as described in SQL Server Concurrent Inserts and Deletes.

Related Solutions

Sql-server – how does SQL Server know to lock view objects

Good question. While it is true that view definitions are expanded into the query text very early on in the compilation and optimization process, all sorts of information is tracked with the cached plan that is not directly visible e.g. statistics that were considered 'interesting' for plan exploration, so that we get the expected recompilation if those statistics change. This is true even if the final plan shape chosen did not directly make use of the changed statistics - we should still recompile because there may now be a better plan available.

In a similar vein, a user would reasonably expect that 'using' a view should block any concurrent attempt to change the definition of that view. This should hold true even though the implementation expands view text before any real compilation or optimization occurs. To that end, it makes sense that the view's object_id should be part of the cached plan and an otherwise-very-compatible Intent-Shared lock taken each time that plan is executed. Holding this lock prevents another session acquiring Sch-M (schema modification) on that object and modifying the view while we are logically referencing it (even though it was expanded). Obviously locks on the objects referenced by the view would not be enough.

To the question of whether this sort of thing should be documented or not. The shape of Books Online over the years suggests that documenting something means it has to go through a cycle of deprecation before it can be changed, so unless that apparent policy changes so that 'documentation' is expanded to include 'interesting details that nevertheless might change at any time', it seems unlikely that this sort of internal implementation detail would ever make it into the formal documentation.

Sql-server – Why are these two INSERTs deadlocking? Is it the trigger? What does this deadlock trace 1222 log tell me

I didn't get a chance to rerun the trace and capture the more nicely formatted XDL file, but in the meantime I was able to resolve the issue by removing all the isolation level and locking hints from the trigger and modifying the application code to run the INSERT itself from within a serializable transaction.

The app uses LINQ to SQL, so the INSERT statement is dynamically generated and run under the default READ COMMITTED isolation level. From the deadlock trace, I could tell that the locks taken by the INSERT were conflicting with the locks taken by the trigger. The deadlock was happening even after removing all the isolation and locking hints from the trigger, so I surmised that the trigger's UPDATE statements were causing stronger locks than the INSERT statement. And when the competing simultaneous transaction tried to do the same thing, the deadlock occurred.

We were definitely on the right track with our previous attempted solutions: using higher isolation levels and stronger locking in the trigger. The problem is that the weaker locks were already taken by the INSERT statement generated by the LINQ to SQL, before the trigger ever got called. The solution is to explicitly start a transaction in our .NET code using SERALIZABLE isolation level, do the insert in that transaction, let the trigger happen in the same transaction (without altering the isolation level or lock hints) and then commit and dispose the transaction. When I did that, I was no longer able to reproduce the issue (I even put it back to the old way and verified that the issue returned.) Here is the new code:

            ' Do the insert in a serializable transaction to prevent simultaneous inserts from deadlocking
            ' due to the trigger, which reaches out to other rows.
            Me.packageContext.Connection.Open()
            Try
                Using oTransaction = Me.packageContext.Connection.BeginTransaction(IsolationLevel.Serializable)
                    Me.packageContext.Transaction = oTransaction
                    Try
                        Me.packageContext.SubmitChanges()
                        oTransaction.Commit()
                    Catch ex As Exception
                        oTransaction.Rollback()
                        Throw
                    End Try
                End Using
            Finally
                ' We need to close the connection and null out the transaction to allow
                ' subsequent uses of the same DataContext to work properly outside this transaction.
                Me.packageContext.Connection.Close()
                Me.packageContext.Transaction = Nothing
            End Try

Normally we avoid opening transactions from within client side code, because they can run long if the code does remote API calls, causing excessive blocking. But in this case we are isolating a very small bit of code, and we have robust error/rollback handling around it.

An alternate solution would be to implement stored procedures that do the INSERTs within explicit SERIALIZABLE transactions, and then alter the DataContext to call the stored procedure. However, that solution is less maintainable as it requires updating both the stored procedure and the DataContext definition every time the schema changes.

Finally, another potential alternate solution would be to examine the deadlock log in detail and find a way to adjust our indexes or keys to avoid the deadlock in the first place. However, that is more difficult and may not be possible with our design.

I think the solution above is the simplest and most understandable for future developers.

Best Answer

General remarks

Related Solutions

Sql-server – how does SQL Server know to lock view objects

Sql-server – Why are these two INSERTs deadlocking? Is it the trigger? What does this deadlock trace 1222 log tell me

Related Question