Sql-server – Ms sql MERGE INTO locks whole table for updates

lockingmergeoptimizationsql server

I have a table for statistic values, it holds millions of records, which is defined like this:

CREATE TABLE [dbo].[Statistic]
(
    [Id] [INT] IDENTITY(1, 1) NOT NULL
  , [EntityId] [INT] NULL
  , [EntityTypeId] [UNIQUEIDENTIFIER] NOT NULL
  , [ValueTypeId] [UNIQUEIDENTIFIER] NOT NULL
  , [Value] [DECIMAL](19, 5) NOT NULL
  , [Date] [DATETIME2](7) NULL
  , [AggregateTypeId] [INT] NOT NULL
  , [JsonData] [NVARCHAR](MAX) NULL
  , [WeekDay] AS (DATEDIFF(DAY, CONVERT([DATETIME], '19000101', (112)), [Date]) % (7) + (1)) PERSISTED
  , CONSTRAINT [PK_Statistic]
        PRIMARY KEY NONCLUSTERED ([Id] ASC)
);

CREATE UNIQUE CLUSTERED INDEX [IX_Statistic_EntityId_EntityTypeId_ValueTypeId_AggregateTypeId_Date]
ON [dbo].[Statistic] (
                         [EntityId] ASC
                       , [EntityTypeId] ASC
                       , [ValueTypeId] ASC
                       , [AggregateTypeId] ASC
                       , [Date] ASC
                     );

CREATE NONCLUSTERED INDEX [IX_Date] ON [dbo].[Statistic] ([Date] ASC);

CREATE NONCLUSTERED INDEX [IX_EntityId]
ON [dbo].[Statistic] ([EntityId] ASC)
INCLUDE ([Id]);

CREATE NONCLUSTERED INDEX [IX_EntityType_Agg_Date]
ON [dbo].[Statistic] ([EntityTypeId] ASC, [AggregateTypeId] ASC, [Date] ASC)
INCLUDE ([Id], [EntityId], [ValueTypeId]);

CREATE NONCLUSTERED INDEX [IX_Statistic_ValueTypeId]
ON [dbo].[Statistic] ([ValueTypeId] ASC)
INCLUDE ([Id]);

CREATE NONCLUSTERED INDEX [IX_WeekDay]
ON [dbo].[Statistic] ([AggregateTypeId] ASC, [WeekDay] ASC, [Date] ASC)
INCLUDE ([Id]);

ALTER TABLE [dbo].[Statistic]
ADD CONSTRAINT [PK_Statistic]
    PRIMARY KEY NONCLUSTERED ([Id] ASC);

During updates with merge, sql server locks the whole table instead of pages/rows, @inTbl is a key/value datatable passed as parameter

MERGE INTO Statistic AS stat
USING
    (SELECT inTbl.EntityId, inTbl.Value FROM @p0 AS inTbl) AS src
ON src.EntityId = stat.EntityId
   AND stat.EntityTypeId = @p1
   AND stat.ValueTypeId = @p2
   AND stat.Date IS NULL
   AND stat.AggregateTypeId = @p3
WHEN MATCHED THEN
    UPDATE SET stat.Value = src.value
WHEN NOT MATCHED BY TARGET THEN
    INSERT (EntityTypeId, ValueTypeId, Date, AggregateTypeId, EntityId, Value)
    VALUES
    (@p4, @p5, @p6, @p7, src.entityId, src.value);

So, I have 2 problems:
1) the merge sometimes takes forever to finish

2) updates like this wait for merge to finish:

UPDATE [dbo].[Statistic]
SET [Value] = @p0, [JsonData] = @p1
WHERE [EntityTypeId] = @p2
      AND [ValueTypeId] = @p3
      AND [Date] = @p4
      AND [EntityId] = @p5
      AND [AggregateTypeId] = @p6;

I have plans/locks files for the queries, but they are rather big, so here they are

before index rebuid: https://www.brentozar.com/pastetheplan/?id=S19EgxYIB

after index rebuild: https://www.brentozar.com/pastetheplan/?id=SyjexxtLH

What can be the problem? This happens occasionally and may sometimes go away after clustered index rebuild.

The clustered index goes fragmeted to 90+% in a day or so. How can I prevent this fragmentation?

Best Answer

I have a table for statistic values, it holds millions of records

...

The clustered index goes fragmeted to 90+% in a day or so.

Look at your clustered index, its key is 48 bytes long, it's not a good choice because your table is big enough and you have also 5 nonclustered indexes. All of them have these 48 bytes at every index level, so every nonclustered index occupies at least twice of space it needs.

IMHO, the first thing to do is, if possible, to change clustered index key, your clustered index can be defined on identity, it will be unique, always increasing, narrow, and this will reduce yor clustered index fragmentation, and in case when JsonData field is never updated clustered index fragmentation will be 0.

This will also decrease your insert time: now too much time is spent to log page slits caused by insert into clustered index.

To your second problem: lock escalation. As you said, every batch contains 2000 rows in the source table, but they cause 3402 rows to be inserted(according to estimated plan), and this is only for clustered index. You have 5 nonclustered indexes, so in one statement you insert at least 6 * 2000 = 12000 rows, or maybe all 20412 rows if the estimations are correct.

Lock escalation triggers on 5000 locks per statement:

In addition to escalating locks when an instance-wide threshold is crossed, SQL Server will also escalate locks when any individual session acquires more than 5,000 locks in a single statement. In this case, there is no randomness in choosing which session will get its locks escalated; it is the session that acquired the locks.

and in your case they very probably are row locks, this is because of your clustered index key that is random. It could take page locks in case of insertion into always increasing key, but your clustered key is really random. And in any case insertions into nonclustered indexes are random too, so it's normal that server chose row locks.

So you can disable lock escalation on your table or split your batches in 1000 rows per batch or even less, this should be tested.

Here is a small repro in response on this comment:

inserts can't take locks (can't lock a resource that doesn't exist)

if object_id('dbo.t') is not null drop table dbo.t;
create table dbo.t(id int identity primary key, col1 varchar(10), col2 varchar(10));
create index ix_col1 on dbo.t(col1);
create index ix_col2 on dbo.t(col2);

begin tran
insert into dbo.t (col1, col2)
select top 1000 'aaa', 'bbb'
from sys.columns c1 cross join sys.columns c2;

select *
from sys.dm_tran_locks
where resource_type <> 'DATABASE'
      and request_session_id = @@spid
order by resource_associated_entity_id,
         resource_type;

rollback tran;

Related Solutions

Sql-server – SELECT/INSERT Deadlock

On the face of it, this looks like a classic lookup deadlock. The essential ingredients for this deadlock pattern are:

a SELECT query that uses a non-covering nonclustered index with a Key Lookup
an INSERT query that modifies the clustered index and then the nonclustered index

The SELECT accesses the nonclustered index first, then the clustered index. The INSERT access the clustered index first, then the nonclustered index. Accessing the same resources in a different order acquiring incompatible locks is a great way to 'achieve' a deadlock of course.

In this case, the SELECT query is:

SELECT query

...and the INSERT query is:

INSERT query

Notice the green highlighted non-clustered indexes maintenance.

We would need to see the serial version of the SELECT plan in case it is very different from the parallel version, but as Jonathan Kehayias notes in his guide to Handling Deadlocks, this particular deadlock pattern is very sensitive to timing and internal query execution implementation details. This type of deadlock often comes and goes without an obvious external reason.

Given access to the system concerned, and suitable permissions, I am certain we could eventually work out exactly why the deadlock occurs with the parallel plan but not the serial (assuming the same general shape). Potential lines of enquiry include checking for optimized nested loops and/or prefetching - both of which can internally escalate the isolation level to REPEATABLE READ for the duration of the statement. It is also possible that some feature of parallel index seek range assignment contributes to the issue. If the serial plan becomes available, I might spend some time looking into the details further, as it is potentially interesting.

The usual solution for this type of deadlocking is to make the index covering, though the number of columns in this case might make that impractical (and besides, we are not supposed to mess with such things on SharePoint, I am told). Ultimately, the recommendation for serial-only plans when using SharePoint is there for a reason (though not necessarily a good one, when it comes right down to it). If the change in cost threshold for parallelism fixes the issue for the moment, this is good. Longer term, I would probably look to separate the workloads, perhaps using Resource Governor so that SharePoint internal queries get the desired MAXDOP 1 behaviour and the other application is able to use parallelism.

The question of exchanges appearing in the deadlock trace seems a red herring to me; simply a consequence of the independent threads owning resources which technically must appear in the tree. I cannot see anything to suggest that the exchanges themselves are contributing directly to the deadlocking issue.

Sql-server – Row estimates always too low

(summarizing my comments and putting as answer)

A query rewrite will solve the issue of getting low row estimates. As Joe Chang explains in his blog post Query Optimizer Gone Wild - Full-Text

CONTAINS is "a predicte used in a WHERE clause" per Microsoft documentation, while CONTAINSTABLE acts as a table.

You get a much better plan (merge join) using CONTAINSTABLE vs the actual plan using contains uses a nested loop join with low row estimates.

You can rewrite the query as :

SELECT TOP 30 p.PersonId,
              p.PersonParentId,
              p.PersonName,
              p.PersonPostCode
FROM dbo.People p
left join containstable (ContactFullText, '"mr" AND "ch*"') cf on cf.[yourKey] = p.PersonId
WHERE p.PersonDeletionDate IS NULL
      AND p.PersonCustomerId = 24
      --AND CONTAINS(ContactFullText, '"mr" AND "ch*"')
      AND p.PersonGroupId IN(197, 206, 186, 198)
      AND [RANK] > 0
ORDER BY p.PersonParentId,
         p.PersonName;

Best Answer

Related Solutions

Sql-server – SELECT/INSERT Deadlock

Sql-server – Row estimates always too low

Related Question