Sql-server – Massive INSERTs blocking SELECTs

lockingsql serversql-server-2008

I have a problem with a massive amount of INSERTs that are blocking my SELECT operations.

Schema

I have a table like this:

CREATE TABLE [InverterData](
    [InverterID] [bigint] NOT NULL,
    [TimeStamp] [datetime] NOT NULL,    
    [ValueA] [decimal](18, 2) NULL,
    [ValueB] [decimal](18, 2) NULL
    CONSTRAINT [PrimaryKey_e149e28f-5754-4229-be01-65fafeebce16] PRIMARY KEY CLUSTERED 
    (
        [TimeStamp] DESC,
        [InverterID] ASC
    ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
    , IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON
    , ALLOW_PAGE_LOCKS = ON)
)

I also have this little helper procedure, that allows me to insert or update (update on conflict) with MERGE command:

CREATE PROCEDURE [InsertOrUpdateInverterData]
    @InverterID bigint, @TimeStamp datetime
    , @ValueA decimal(18,2), @ValueB decimal(18,2)
AS
BEGIN
    MERGE [InverterData] AS TARGET
        USING (VALUES (@InverterID, @TimeStamp, @ValueA, @ValueB))
        AS SOURCE ([InverterID], [TimeStamp], [ValueA], [ValueB])
        ON TARGET.[InverterID] = @InverterID AND TARGET.[TimeStamp] = @TimeStamp
    WHEN MATCHED THEN
        UPDATE
        SET [ValueA] = SOURCE.[ValueA], [ValueB] = SOURCE.[ValueB]              
    WHEN NOT MATCHED THEN
        INSERT ([InverterID], [TimeStamp], [ValueA], [ValueB]) 
        VALUES (SOURCE.[InverterID], SOURCE.[TimeStamp], SOURCE.[ValueA], SOURCE.[ValueB]);
END

Usage

I now have run service instances on multiple servers that perform massive updates by calling the [InsertOrUpdateInverterData] procedure rapidly.

There is also a website that does SELECT queries on the [InverterData] table.

Problem

If I do SELECT queries on the [InverterData] table they are proceeded in different timespans, depended on the INSERT usage of my service instances.
If I pause all service instances the SELECT is lightning-fast, if the instance perform rapid insert the SELECTs get really slow or even a timeout cancel.

Attempts

I'm done some SELECTs on the [sys.dm_tran_locks] table to find locking processes, like this

SELECT
tl.request_session_id,
wt.blocking_session_id,
OBJECT_NAME(p.OBJECT_ID) BlockedObjectName,
h1.TEXT AS RequestingText,
h2.TEXT AS BlockingText,
tl.request_mode

FROM sys.dm_tran_locks AS tl

INNER JOIN sys.dm_os_waiting_tasks AS wt ON tl.lock_owner_address = wt.resource_address
INNER JOIN sys.partitions AS p ON p.hobt_id = tl.resource_associated_entity_id
INNER JOIN sys.dm_exec_connections ec1 ON ec1.session_id = tl.request_session_id
INNER JOIN sys.dm_exec_connections ec2 ON ec2.session_id = wt.blocking_session_id
CROSS APPLY sys.dm_exec_sql_text(ec1.most_recent_sql_handle) AS h1
CROSS APPLY sys.dm_exec_sql_text(ec2.most_recent_sql_handle) AS h2

This is the result:

S = Shared. The holding session is granted shared access to the resource.

Question

Why are the SELECTs blocked by the [InsertOrUpdateInverterData] procedure that is only using MERGE commands?

Do I have to use some kind of transaction with defined isolation mode inside of [InsertOrUpdateInverterData]?

Update 1 (related to question from @Paul)

Base on MS-SQL server internal reporting about [InsertOrUpdateInverterData] following statistic:

Average CPU-Time: 0.12ms
Average Read processes: 5.76 per/s
Average Write processes: 0.4 per/s

Base on this it looks like the MERGE command is mostly busy with reading operations that will lock the table!(?)

Update 2 (related to question from @Paul)

The [InverterData] table as has following storage stats:

Data space: 26,901.86 MB
Row count: 131,827,749
Partitioned: true
Partition count: 62

Here is the (allmost) complete sp_WhoIsActive result set:

`SELECT` command

dd hh:mm:ss.mss: 00 00:01:01.930
session_id: 73
wait_info: (12629ms)LCK_M_S
CPU: 198
blocking_session_id: 146
reads: 99,368
writes: 0
status: suspended
open_tran_count: 0

Blocking `[InsertOrUpdateInverterData]` command

dd hh:mm:ss.mss: 00 00:00:00.330
session_id: 146
wait_info: NULL
CPU: 3,972
blocking_session_id: NULL
reads: 376,95
writes: 126
status: sleeping
open_tran_count: 1

Best Answer

First, though slightly unrelated to the main question, your MERGE statement is potentially at risk of errors due to a race condition. The problem, in a nutshell, is that it is possible for multiple concurrent threads to conclude that the target row does not exist, resulting in colliding attempts to insert. The root cause is that it is not possible to take a shared or update lock on a row that does not exist. The solution is to add a hint:

MERGE [dbo].[InverterData] WITH (SERIALIZABLE) AS [TARGET]

The serializable isolation level hint ensures the key range where the row would go is locked. You have a unique index to support range locking, so this hint will not have an adverse effect on locking, you will simply gain protection against this potential race condition.

Main Question

Why are the SELECTs blocked by the [InsertOrUpdateInverterData] procedure that is only using MERGE commands?

Under the default locking read committed isolation level, shared (S) locks are taken when reading data, and typically (though not always) released soon after the read is completed. Some shared locks are held to the end of the statement.

A MERGE statement modifies data, so it will acquire S or update (U) locks when locating the data to change, which are converted to exclusive (X) locks just before performing the actual modification. Both U and X locks must be held to the end of the transaction.

This is true under all isolation levels except the 'optimistic' snapshot isolation (SI) not - to be confused with versioning read committed, also known as read committed snapshot isolation (RCSI).

Nothing in your question shows a session waiting for an S lock being blocked by a session holding a U lock. These locks are compatible. Any blocking is almost certainly being caused by blocking on a held X lock. This can be a bit tricky to capture when a large number of short-term locks are being taken, converted, and released in a short time interval.

The open_tran_count: 1 on the InsertOrUpdateInverterData command is worth investigating. Although the command hadn't been running very long, you should check that you don't have a containing transaction (in the application or higher-level stored procedure) that is unnecessarily long. Best practice is to keep transactions as short as possible. This may be nothing, but you should definitely check.

Potential solution

As Kin suggested in a comment, you could look to enable a row-versioning isolation level (RCSI or SI) on this database. RCSI is the most often used, since it typically does not require as many application changes. Once enabled, the default read committed isolation level uses row versions instead of taking S locks for reads, so S-X blocking is reduced or eliminated. Some operations (e.g. foreign key checks) still acquire S locks under RCSI.

Be aware though that row versions consume tempdb space, broadly speaking proportional to the rate of change activity and the length of transactions. You will need to test your implementation thoroughly under load to understand and plan for the impact of RCSI (or SI) in your case.

If you want to localize your usage of versioning, rather than enabling it for the whole workload, SI might still be a better choice. By using SI for the read transactions, you will avoid the contention between readers and writers, at the cost of readers seeing the version of the row before any concurrent modification started (more correctly, the read operation under SI will always see the committed state of the row at the time the SI transaction started). There is little or no benefit to using SI for the writing transactions, because write locks will still be taken, and you'll need to handle any write conflicts. Unless that is what you want :)

Note: Unlike RCSI (which once enabled applies to all transactions running at read committed), SI has to be explicitly requested using SET TRANSACTION ISOLATION SNAPSHOT;.

Subtle behaviours that depend on readers blocking writers (including in trigger code!) make testing essential. See my linked article series and Books Online for details. If you do decide on RCSI, be sure to review Data Modifications under Read Committed Snapshot Isolation in particular.

Finally, you should ensure your instance is patched to SQL Server 2008 Service Pack 4.

Related Solutions

Sql-server – SQL Server Tuning a table for massive inserts

One solution I used was to thread out writes to multiple staging tables, then collect the data in batches. We put the staging tables on their own filegroups to try and segregate the I/O as much as possible, though we did end up with two tables on each available LUN, as opposed to each table getting its own dedicated I/O.

Since you said order of writes doesn't matter, this may be viable, but I don't know if 20 seconds will be sufficient to make this work and still not encounter a bunch of blocking.

So let's say we have a base table like this, which is where everyone reads from (and is currently writing to):

CREATE TABLE dbo.StockData
(
  StockID INT,
  ...other info...
);

Then we have staging tables like this, where we're going to write instead:

CREATE TABLE staging.StockData0
(
  StockID INT,
  ...other info...
);

CREATE TABLE staging.StockData1
(
  StockID INT,
  ...other info...
);

... up to StockData9

*Don't bother mimicking all of the non-clustered indexes, constraints etc. on these staging tables,

Then assuming you have singleton inserts via a stored procedure, change the stored procedure to use dynamic SQL and insert into one of the 10 tables based on the StockID value:

DECLARE @sql NVARCHAR(MAX) = N'INSERT staging.StockData'
  + CONVERT(CHAR(1), @StockID % 10 + '(StockID, cols...)
    SELECT @StockID, @params...;';

EXEC sp_executesql @sql, N'@StockID INT, @params...', @StockID, @params...;

Then have a background job that performs a very quick, metadata operation moving these tables into a dummy schema - this minimizes the amount of time that is spent blocking users' ability to write while you move this data into the real table (which you can now control much better than sporadic writes from all over the place). So first create a second copy of the tables in a different schema:

CREATE SCHEMA shadow  AUTHORIZATION dbo;
CREATE SCHEMA holding AUTHORIZATION dbo;
GO

CREATE TABLE shadow.StockData0
(
  StockID INT,
  ...other info...
);

-- repeat for 1-9

Now the job would do this:

TRUNCATE TABLE shadow.StockData0;

BEGIN TRANSACTION;

  ALTER SCHEMA holding TRANSFER staging.StockData0;
  ALTER SCHEMA staging TRANSFER shadow.StockData0;

COMMIT TRANSACTION;

ALTER SCHEMA shadow TRANSFER holding.StockData0;

-- repeat for 1-9 (I actually used dynamic SQL in a loop
-- so I wouldn't have to repeat myself ten times...

I blogged about this part of it here and here.

The job would then go on to insert batches into the primary table, with a delay in between, to allow some reads to happen.

BEGIN TRANSACTION;

  INSERT dbo.StockData(StockID, ...cols...)
    SELECT StockID, ...cols...
    FROM holding.StockData0;

COMMIT TRANSACTION;

WAITFOR DELAY '00:00:01';

-- repeat for 1-9

This was several years ago, and something that didn't occur to me at the time (which I remember now, too, after seeing the other link posted by wBob) is that using TABLOCK inside those transactions may help speed up the inserts (though you'll want to test based on volume, isolation level, etc).

It might also be useful (even with all this work) to enable Read Committed Snapshot Isolation and/or if you are on Enterprise Edition considering partitioning (and aligning the staging tables to those partitions).

Sql-server – How to update and insert specific set of rows to an existing table using Merge in SQL Server 2008 R2

A simpler way to write the MERGE is as follows:

MERGE #target /* WITH (SERIALIZABLE) */ AS T
USING #source AS S
    ON S.Id = T.Id
WHEN MATCHED
    AND T.IsDeleted = 0
    AND S.Modified > T.Modified
    THEN UPDATE SET
        Modified = S.Modified,
        NoteText = S.NoteText
WHEN NOT MATCHED THEN
    INSERT (Id, Modified, IsDeleted, NoteText)
    VALUES (S.Id, S.Modified, 0, S.NoteText);

The SERIALIZABLE hint would be necessary to avoid a potential race condition. This consideration obviously does not apply to a local temporary table target, but it would be remiss of me not to mention.

In broad terms, it is almost always wrong to perform any sort of filtering in the ON clause (or equivalently, in a CTE). The ON clause should only be used to specify how rows between the source and target are related.

See Inserting, Updating, and Deleting Data by Using Merge for more details.

The source and target tables should also have a unique index or key. Moreover, for best performance SQL Server should be able to guarantee that the MERGE statement will affect each target row a maximum of once. Without that, the execution plan will have to detect multiple DML operations on the same row in order to raise a runtime error if it happens. This often requires a Sort, and always requires a Segment, Sequence Project, and Assert. None of these will make your command execute any faster :)