Sql-server – How to Prevent Partitioned Columnstore Deadlocks on SELECT

columnstoredeadlockpartitioningsql serversql-server-2016

I have three Clustered Columnstore Index (CCI) tables in SQL Server 2016. All of these CCIs are in the same partitioning scheme, based on Tenant ID. Lately, and inconsistently, I am getting deadlocks on simple select statements from joins to these tables. Example query that deadlocks:

SELECT  TOP 33 r.tenantid
FROM    Table_r r
        INNER JOIN Table_cm cm ON r.MyKey=cm.MyKey 
        INNER JOIN Table_pe pe ON r.MyKey=pe.MyKey 
WHERE   r.TenantId = 69
        AND pe.TenantId = 69
        AND cm.TenantId = 69

Error message:

Transaction (Process ID 56) was deadlocked on generic waitable object resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Clues:

  • If the query uses another index besides the CCI it does not deadlock.
  • If I remove two of the three tenantid filters it does not deadlock.
  • If I SELECT top 32 or lower it does not deadlock.
  • If I add OPTION (MAXDOP 1) it does not deadlock.
  • I can repro this in my scrambled PROD replica, PROD READ-ONLY Secondary, and PROD itself.
  • I cannot repro this behavior in DEV or INT.
  • It still deadlocks if I add WITH(NOLOCK) to all 3 table joins
  • The query deadlocks itself. It will deadlock when there are no other active processes.
  • Query plans without parallelism do not deadlock

Deadlock xml here

Our PROD Version:

Microsoft SQL Server 2016 (SP2-CU5) (KB4475776) – 13.0.5264.1 (X64) Jan 10 2019 18:51:38 Copyright (c) Microsoft Corporation Enterprise Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 (Build 9600: ) (Hypervisor)

How do I prevent deadlocks on this query?

Best Answer

Since you're on SQL Server 2016, it's worth mentioning that there is at least one public bug fix for parallel deadlocks involving columnstore indexes:

FIX: A deadlock occurs when you run a parallel query on a clustered columnstore index in SQL Server 2016 and 2017

(thanks to Denis Rubashkin for providing the link initially)

This was released as part of SP1 CU7. If you're not up to that CU, you should give that a shot. This fix would also be included in SP2 (any of the CUs).

In general, the two approaches for fixing intra-query parallelism deadlocks:

  • avoid parallelism (by tuning the query so that it doesn't go parallel, using a MAXDOP hint, etc.) - this is covered in the other answer by Thomas Costers
  • apply the latest service pack / cumulative updates to SQL Server