Sql-server – Framework to effectively identify blocking queries

blockinglockingperformanceperformance-tuningsql servertransaction

I need a routine to effectively identify which queries caused blocking. This is related to my previous question How to find the query that is still holding a lock?.

I know there is a bunch of material online regarding this, but all of them are based on the premise that the last SQL statement on an active session is most likely the one who acquired the lock (hence generating the blocking), which is not always true (in my case, never).

I've set the blocking-process-threshold to 30 seconds and started analysing the Blocking Process Reports (BPR).
These reports are fired every time a blocking occurs, when the threshold is reached.
It contains information about the blocked spid and the blocking spid.

Often the blocking spid runs a couple of statements after the one that acquired and is holding the lock on a resource (table, page or row): so despite of the report content, I remain clueless about which query exactly caused that block.

Usually the SQL Server DMVs show only the last SQL text for each session_id, and the DMVs related to active locks (such as sys.dm_tran_locks) also don't address this issue.

Tuning the blocked queries is not the best approach here: our application is all based on dynamic SQL embedded on client code, we don't use stored procedures and based on the blockings that I saw until now, all of the blocked queries were correctly indexed and written.

I think an option to solve this would be to collect candidate queries, which could have generated a blocking and then lookup on this info using timestamp and spid gathered on BPR. Do you agree? If so, can you point a way to do this with the least overhead possible using xEvents?

Best Answer

I'd suggest looking for long-running sessions, using an XEvents session.

The problem you're describing sounds like you have client code that is performing row-by-agonizing-row (RBAR) processing instead of using efficient set-based approaches.

Poorly designed client applications may do something like this:

Connect to SQL Server to get a list of items that need processing. The query result is held open until all rows have been processed.
perform some long-running process on each row.
Close the query.

What should happen is:

Connect to SQL Server
Run a query, caching the results of the query locally, and closing the connection.
Run the long-running process against each row without keeping the original query open, thereby preventing blocking.

A workaround for SQL Server can be implemented using snapshot isolation row versioning. See this Technet document for details. Essentially, row-versioning allows writers to not block readers, vastly reducing blocking.

Main Question

Why are the SELECTs blocked by the [InsertOrUpdateInverterData] procedure that is only using MERGE commands?

Under the default locking read committed isolation level, shared (S) locks are taken when reading data, and typically (though not always) released soon after the read is completed. Some shared locks are held to the end of the statement.

A MERGE statement modifies data, so it will acquire S or update (U) locks when locating the data to change, which are converted to exclusive (X) locks just before performing the actual modification. Both U and X locks must be held to the end of the transaction.

This is true under all isolation levels except the 'optimistic' snapshot isolation (SI) not - to be confused with versioning read committed, also known as read committed snapshot isolation (RCSI).

Nothing in your question shows a session waiting for an S lock being blocked by a session holding a U lock. These locks are compatible. Any blocking is almost certainly being caused by blocking on a held X lock. This can be a bit tricky to capture when a large number of short-term locks are being taken, converted, and released in a short time interval.

The open_tran_count: 1 on the InsertOrUpdateInverterData command is worth investigating. Although the command hadn't been running very long, you should check that you don't have a containing transaction (in the application or higher-level stored procedure) that is unnecessarily long. Best practice is to keep transactions as short as possible. This may be nothing, but you should definitely check.

Potential solution

As Kin suggested in a comment, you could look to enable a row-versioning isolation level (RCSI or SI) on this database. RCSI is the most often used, since it typically does not require as many application changes. Once enabled, the default read committed isolation level uses row versions instead of taking S locks for reads, so S-X blocking is reduced or eliminated. Some operations (e.g. foreign key checks) still acquire S locks under RCSI.

Be aware though that row versions consume tempdb space, broadly speaking proportional to the rate of change activity and the length of transactions. You will need to test your implementation thoroughly under load to understand and plan for the impact of RCSI (or SI) in your case.

If you want to localize your usage of versioning, rather than enabling it for the whole workload, SI might still be a better choice. By using SI for the read transactions, you will avoid the contention between readers and writers, at the cost of readers seeing the version of the row before any concurrent modification started (more correctly, the read operation under SI will always see the committed state of the row at the time the SI transaction started). There is little or no benefit to using SI for the writing transactions, because write locks will still be taken, and you'll need to handle any write conflicts. Unless that is what you want :)

Note: Unlike RCSI (which once enabled applies to all transactions running at read committed), SI has to be explicitly requested using SET TRANSACTION ISOLATION SNAPSHOT;.

Subtle behaviours that depend on readers blocking writers (including in trigger code!) make testing essential. See my linked article series and Books Online for details. If you do decide on RCSI, be sure to review Data Modifications under Read Committed Snapshot Isolation in particular.

Finally, you should ensure your instance is patched to SQL Server 2008 Service Pack 4.

Best Answer

Related Solutions

Sql-server – Schema blocking in READ_COMMITTED_SNAPSHOT

Sql-server – Massive INSERTs blocking SELECTs

Main Question

Potential solution

Related Question