Sch-M WAIT Blocking Sch-S in SQL Server 2014 vs 2008 R2

lockingservice-brokersql server

We recently migrated our production instances from SQL 2008 R2 to brand new SQL 2014 servers. Here is an interesting scenario that we uncovered with our usage of Service Broker. Consider a database with Broker Enabled = true with MyService and MyQueue. Poison message handling is disabled on this queue. There are least 2 active conversations with messages in the queue.

In one process (SPID 100) execute:

BEGIN TRANSACTION;
DECLARE @conversation_group_id UNIQUEIDENTIFIER;
RECEIVE TOP (1) @conversation_group_id = conversation_handle FROM MyQueue;

Note that we leave the transaction open. Imagine that it's a .NET program that's waiting a long time on some external resource. Via sys.dm_tran_locks we see that this SPID has been granted an IX lock on the queue.

| type   | resource_id | mode | status | spid |
| OBJECT | 277576027   | IX   | GRANT  | 100  |

In a separate process (SPID 101) execute five times:

BEGIN TRANSACTION;
DECLARE @conversation_group_id UNIQUEIDENTIFIER;
RECEIVE TOP (1) @conversation_group_id = conversation_handle FROM MyQueue;
ROLLBACK TRANSACTION;

The key here is that we are rolling back the transaction five times. This triggers the built in Poison Message Handling background logic. While the queue does not get disabled (because it's configured to not disable), a background task is still trying to do work and fire a broker_queue_disabled event. So now if we query sys.dm_tran_locks again we'll see a different SPID (associated with BRKR TASK) waiting on an Sch-M lock.

| type   | resource_id | mode  | status | spid |
| OBJECT | 277576027   | IX    | GRANT  | 100  |
| OBJECT | 277576027   | Sch-M | WAIT   | 36   |

So far, everything makes sense.

Finally, on a different process (SPID 102), attempt to SEND to a Service using that Queue:

BEGIN TRANSACTION;
DECLARE @ch uniqueidentifier;
BEGIN DIALOG @ch FROM SERVICE [MyService] TO SERVICE 'MyService';
SEND ON CONVERSATION @ch ('HELLO WORLD');

The SEND command is blocked. If we look again at sys.dm_tran_locks we see that this process is waiting on a Sch-S lock. Executing sp_who2 we find that SPID 102 is blocked by SPID 36.

| type   | resource_id | mode  | status | spid |
| OBJECT | 277576027   | IX    | GRANT  | 100  |
| OBJECT | 277576027   | Sch-M | WAIT   | 36   |
| OBJECT | 277576027   | Sch-S | WAIT   | 102  |

Why does a Sch-S lock wait on a Sch-M lock that is also waiting?

This behavior is completely different in SQL 2008 R2! Using this exact same scenario, running on our yet-to-be-decommissioned 2008R2 instances, the final batch including the SEND command does not get blocked by the waiting Sch-M lock.

Has locking behavior changed in SQL 2012 or 2014? Is there perhaps some database or server setting which might affect this locking behavior?

Best Answer

The behaviour did change between SQL Server 2008 R2 and SQL Server 2012. The 2008 R2 implementation was inconsistent with the documented 'relaxed FIFO' semantics:

Locks are granted in a relaxed first-in, first-out (FIFO) fashion. Although the order is not strict FIFO, it preserves desirable properties such as avoiding starvation and works to reduce unnecessary deadlocks and blocking.

New lock requests where the requestor does not yet own a lock on the resource become blocked if the requested mode is incompatible with the union of granted requests and the modes of pending requests.

A conversion request becomes blocked only if the requested mode is incompatible with the union of all granted modes, excluding the mode in which the conversion request itself was originally granted.

In 2008 R2, a new Sch-S lock request was granted despite it being incompatible with the union of granted and waiting requests, which might lead to lock starvation. In 2012, the Sch-S lock request is blocked.

The reproduction script below uses regular tables rather than a Service Broker queue:

-- Session 1
CREATE TABLE dbo.LockTest (col1 integer NULL);

INSERT dbo.LockTest (col1) VALUES (1);

BEGIN TRANSACTION;

-- Will hold row-X, Pag-IX, and Tab-IX
INSERT dbo.LockTest (col1) VALUES (2);

-- Session 2
-- Blocked waiting on Sch-M
TRUNCATE TABLE dbo.LockTest;

-- Session 3
-- Takes Sch-S only
-- Not blocked in 2008 R2
SELECT * FROM dbo.LockTest AS LT WITH (READUNCOMMITTED);

In summary, 2008 R2 did not behave as designed. The problem was fixed in SQL Server 2012.