Sql-server – Massive INSERTs blocking SELECTs

lockingsql serversql-server-2008

I have a problem with a massive amount of INSERTs that are blocking my SELECT operations.

Schema

I have a table like this:

CREATE TABLE [InverterData](
    [InverterID] [bigint] NOT NULL,
    [TimeStamp] [datetime] NOT NULL,    
    [ValueA] [decimal](18, 2) NULL,
    [ValueB] [decimal](18, 2) NULL
    CONSTRAINT [PrimaryKey_e149e28f-5754-4229-be01-65fafeebce16] PRIMARY KEY CLUSTERED 
    (
        [TimeStamp] DESC,
        [InverterID] ASC
    ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
    , IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON
    , ALLOW_PAGE_LOCKS = ON)
)

I also have this little helper procedure, that allows me to insert or update (update on conflict) with MERGE command:

CREATE PROCEDURE [InsertOrUpdateInverterData]
    @InverterID bigint, @TimeStamp datetime
    , @ValueA decimal(18,2), @ValueB decimal(18,2)
AS
BEGIN
    MERGE [InverterData] AS TARGET
        USING (VALUES (@InverterID, @TimeStamp, @ValueA, @ValueB))
        AS SOURCE ([InverterID], [TimeStamp], [ValueA], [ValueB])
        ON TARGET.[InverterID] = @InverterID AND TARGET.[TimeStamp] = @TimeStamp
    WHEN MATCHED THEN
        UPDATE
        SET [ValueA] = SOURCE.[ValueA], [ValueB] = SOURCE.[ValueB]              
    WHEN NOT MATCHED THEN
        INSERT ([InverterID], [TimeStamp], [ValueA], [ValueB]) 
        VALUES (SOURCE.[InverterID], SOURCE.[TimeStamp], SOURCE.[ValueA], SOURCE.[ValueB]);
END

Usage

I now have run service instances on multiple servers that perform massive updates by calling the [InsertOrUpdateInverterData] procedure rapidly.

There is also a website that does SELECT queries on the [InverterData] table.

Problem

If I do SELECT queries on the [InverterData] table they are proceeded in different timespans, depended on the INSERT usage of my service instances.
If I pause all service instances the SELECT is lightning-fast, if the instance perform rapid insert the SELECTs get really slow or even a timeout cancel.

Attempts

I'm done some SELECTs on the [sys.dm_tran_locks] table to find locking processes, like this

SELECT
tl.request_session_id,
wt.blocking_session_id,
OBJECT_NAME(p.OBJECT_ID) BlockedObjectName,
h1.TEXT AS RequestingText,
h2.TEXT AS BlockingText,
tl.request_mode

FROM sys.dm_tran_locks AS tl

INNER JOIN sys.dm_os_waiting_tasks AS wt ON tl.lock_owner_address = wt.resource_address
INNER JOIN sys.partitions AS p ON p.hobt_id = tl.resource_associated_entity_id
INNER JOIN sys.dm_exec_connections ec1 ON ec1.session_id = tl.request_session_id
INNER JOIN sys.dm_exec_connections ec2 ON ec2.session_id = wt.blocking_session_id
CROSS APPLY sys.dm_exec_sql_text(ec1.most_recent_sql_handle) AS h1
CROSS APPLY sys.dm_exec_sql_text(ec2.most_recent_sql_handle) AS h2

This is the result:

enter image description here

S = Shared. The holding session is granted shared access to the resource.

Question

Why are the SELECTs blocked by the [InsertOrUpdateInverterData] procedure that is only using MERGE commands?

Do I have to use some kind of transaction with defined isolation mode inside of [InsertOrUpdateInverterData]?

Update 1 (related to question from @Paul)

Base on MS-SQL server internal reporting about [InsertOrUpdateInverterData] following statistic:

  • Average CPU-Time: 0.12ms
  • Average Read processes: 5.76 per/s
  • Average Write processes: 0.4 per/s

Base on this it looks like the MERGE command is mostly busy with reading operations that will lock the table!(?)

Update 2 (related to question from @Paul)

The [InverterData] table as has following storage stats:

  • Data space: 26,901.86 MB
  • Row count: 131,827,749
  • Partitioned: true
  • Partition count: 62

Here is the (allmost) complete sp_WhoIsActive result set:

SELECT command

  • dd hh:mm:ss.mss: 00 00:01:01.930
  • session_id: 73
  • wait_info: (12629ms)LCK_M_S
  • CPU: 198
  • blocking_session_id: 146
  • reads: 99,368
  • writes: 0
  • status: suspended
  • open_tran_count: 0

Blocking [InsertOrUpdateInverterData] command

  • dd hh:mm:ss.mss: 00 00:00:00.330
  • session_id: 146
  • wait_info: NULL
  • CPU: 3,972
  • blocking_session_id: NULL
  • reads: 376,95
  • writes: 126
  • status: sleeping
  • open_tran_count: 1

Best Answer

First, though slightly unrelated to the main question, your MERGE statement is potentially at risk of errors due to a race condition. The problem, in a nutshell, is that it is possible for multiple concurrent threads to conclude that the target row does not exist, resulting in colliding attempts to insert. The root cause is that it is not possible to take a shared or update lock on a row that does not exist. The solution is to add a hint:

MERGE [dbo].[InverterData] WITH (SERIALIZABLE) AS [TARGET]

The serializable isolation level hint ensures the key range where the row would go is locked. You have a unique index to support range locking, so this hint will not have an adverse effect on locking, you will simply gain protection against this potential race condition.

Main Question

Why are the SELECTs blocked by the [InsertOrUpdateInverterData] procedure that is only using MERGE commands?

Under the default locking read committed isolation level, shared (S) locks are taken when reading data, and typically (though not always) released soon after the read is completed. Some shared locks are held to the end of the statement.

A MERGE statement modifies data, so it will acquire S or update (U) locks when locating the data to change, which are converted to exclusive (X) locks just before performing the actual modification. Both U and X locks must be held to the end of the transaction.

This is true under all isolation levels except the 'optimistic' snapshot isolation (SI) not - to be confused with versioning read committed, also known as read committed snapshot isolation (RCSI).

Nothing in your question shows a session waiting for an S lock being blocked by a session holding a U lock. These locks are compatible. Any blocking is almost certainly being caused by blocking on a held X lock. This can be a bit tricky to capture when a large number of short-term locks are being taken, converted, and released in a short time interval.

The open_tran_count: 1 on the InsertOrUpdateInverterData command is worth investigating. Although the command hadn't been running very long, you should check that you don't have a containing transaction (in the application or higher-level stored procedure) that is unnecessarily long. Best practice is to keep transactions as short as possible. This may be nothing, but you should definitely check.

Potential solution

As Kin suggested in a comment, you could look to enable a row-versioning isolation level (RCSI or SI) on this database. RCSI is the most often used, since it typically does not require as many application changes. Once enabled, the default read committed isolation level uses row versions instead of taking S locks for reads, so S-X blocking is reduced or eliminated. Some operations (e.g. foreign key checks) still acquire S locks under RCSI.

Be aware though that row versions consume tempdb space, broadly speaking proportional to the rate of change activity and the length of transactions. You will need to test your implementation thoroughly under load to understand and plan for the impact of RCSI (or SI) in your case.

If you want to localize your usage of versioning, rather than enabling it for the whole workload, SI might still be a better choice. By using SI for the read transactions, you will avoid the contention between readers and writers, at the cost of readers seeing the version of the row before any concurrent modification started (more correctly, the read operation under SI will always see the committed state of the row at the time the SI transaction started). There is little or no benefit to using SI for the writing transactions, because write locks will still be taken, and you'll need to handle any write conflicts. Unless that is what you want :)

Note: Unlike RCSI (which once enabled applies to all transactions running at read committed), SI has to be explicitly requested using SET TRANSACTION ISOLATION SNAPSHOT;.

Subtle behaviours that depend on readers blocking writers (including in trigger code!) make testing essential. See my linked article series and Books Online for details. If you do decide on RCSI, be sure to review Data Modifications under Read Committed Snapshot Isolation in particular.

Finally, you should ensure your instance is patched to SQL Server 2008 Service Pack 4.