Sql-server – Deadlock on partitioned table

deadlockpartitioningsql serversql-server-2012t-sql

we see deadlocks when loading a partitioned table. I wonder if someone can confirm what I think is the reason, or if I am looking at it wrong and we need to fix another thing.

Our setup is like this: A big data warehouse table gets data added to it every night. This is done through an SSIS Package, and in parallel. The data can be separated by column "country", and we run the same package for several countries at the same time to speed up things. One package alone simply does not use all resources of the machine.

With a non-partitioned table, we saw deadlocks once every few weeks. Not too bad, but too often to ignore.

We decided to partion the table by country, hoping to get rid of the deadlocks. I followed what I think is best practice, with an empty partition to the left and right, and one partition for each country. The country is encoded in an NVARCHAR(3) datatype. We recreated the clustered index using the partitioned column, and finally set lock escalation for the table to auto.

I can verify that all partitions are loaded with data belonging to it only, so all seems to be working fine.

However, we still keep getting deadlocks. I would have expected that my "write operation" (quotes will be explained next) would each focus on just one partition, and the parallel running SSIS packages could not interfere.

First question: Am I right in this expectation? Should I not be getting deadlocks now?

I think I have an answer to why they still appear: the "write operation" mentioned in quotes is somewhat complicated, it is an own SSIS component written in c# before my time with my employer. Each package creates and writes ALL DATA into it's own temp table, and then executes a

MERGE WHEN MATCHED UPDATE WHEN NOT MATCHED INSERT

from it's temp to the final table.
When building this MERGE-Statement, the component uses a construction to compare NULLs, i.e.

WHERE @given IS NULL and col is NULL OR @given is not null and col = @given

and it does this also for the country-column, which is part of the clustered index and partition key.

I suspect, that by having a query relating NULL to the partition key, I trigger a "search and lock every partition" operation. Is this the case, does something like this exist? Or does the description point to another reason why we still get deadlocks?

Thanks a lot for reading and your suggestions!

Best Answer

There are a number of "features" of merge statements that can lead to issues like this. For a relatively comprehensive list of issues take a look at the following link:

Use Caution with SQL Server's MERGE Statement

I've seen similar issues occurring myself, the solution was either to re-write the code to avoid MERGE altogether or in once case where it was unavoidable, dropping indexes on the target table before the MERGE and then restoring them afterwards resolved the issue.

Related Solutions

Sql-server – Full-text index in partitioned table

I did some more research and found a couple of useful articles with best practices regarding table partitioning and full-text searching on large tables. As I haven't received any answers to this question, I thought I'd post what I've found here for future reference.

I'm still reading those, but for what I've already read, I've found both of them useful and relevant to this particular situation:

SQL Server 2005 Full-Text Queries on Large Catalogs: Lessons Learned:

The observations in this paper are based on tests run in the SQL Server Customer Lab for a customer who needed to scale up full-text search to a much greater potential volume. The paper describes the customer scenario, provides an overview of SQL Server 2005 full-text concepts that bear on the results, and offers lessons learned and recommendations for using full-text queries on large catalogs.

Partitioned Tables and Indexes in SQL Server 2005

Summary: Table-based partitioning features in SQL Server 2005 provide flexibility and performance to simplify the creation and maintenance of partitioned tables. Trace the progression of capabilities from logically and manually partitioning tables to the latest partitioning features, and find out why, when, and how to design, implement, and maintain partitioned tables using SQL Server 2005. (41 printed pages)

Sql-server – SQL Server does not optimize parallel merge join on two equivalently partitioned tables

You're right that the SQL Server optimizer prefers not to generate parallel MERGE join plans (it costs this alternative quite high). Parallel MERGE always requires repartitioning exchanges on both join inputs, and more importantly, it requires that row order be preserved across those exchanges.

Parallelism is most efficient when each thread can run independently; order preservation often leads to frequent synchronization waits, and may ultimately cause exchanges to spill to tempdb to resolve an intra-query deadlock condition.

These problems can be circumvented by running multiple instances of the whole query on one thread each, with each thread processing an exclusive range of data. This is not a strategy that the optimizer considers natively, however. As it is, the original SQL Server model for parallelism breaks the query at exchanges, and runs the plan segments formed by those splits on multiple threads.

There are ways to achieve running whole query plans on multiple threads over exclusive data set ranges, but they require trickery that not everyone will be happy with (and will not be supported by Microsoft or guaranteed to work in the future). One such approach is to iterate over the partitions of a partitioned table and give each thread the task of producing a subtotal. The result is the SUM of the row counts returned by each independent thread:

Obtaining partition numbers is easy enough from metadata:

DECLARE @P AS TABLE
(
    partition_number integer PRIMARY KEY
);

INSERT @P (partition_number)
SELECT
    p.partition_number
FROM sys.partitions AS p 
WHERE 
    p.[object_id] = OBJECT_ID(N'test_transaction_properties', N'U')
    AND p.index_id = 1;

We then use these numbers to drive a correlated join (APPLY), and the $PARTITION function to limit each thread to the current partition number:

SELECT
    row_count = SUM(Subtotals.cnt)
FROM @P AS p
CROSS APPLY
(
    SELECT
        cnt = COUNT_BIG(*)
    FROM dbo.test_transaction_item_detail AS i
    JOIN dbo.test_transaction_properties AS t ON
        t.transactionID = i.transactionID
    WHERE 
        $PARTITION.pf_test_transactionId(t.transactionID) = p.partition_number
        AND $PARTITION.pf_test_transactionId(i.transactionID) = p.partition_number
) AS SubTotals;

The query plan shows a MERGE join being performed for each row in table @P. The clustered index scan properties confirm that only a single partition is processed on each iteration:

Apply serial plan

Unfortunately, this only results in sequential serial processing of partitions. On the data set you provided, my 4-core (hyperthreaded to 8) laptop returns the correct result in 7 seconds with all data in memory.

To get the MERGE sub-plans to run concurrently, we need a parallel plan where partition ids are distributed over the available threads (MAXDOP) and each MERGE sub-plan runs on a single thread using the data in one partition. Unfortunately, the optimizer frequently decides against parallel MERGE on cost grounds, and there is no documented way to force a parallel plan. There is an undocumented (and unsupported) way, using trace flag 8649:

SELECT
    row_count = SUM(Subtotals.cnt)
FROM @P AS p
CROSS APPLY
(
    SELECT
        cnt = COUNT_BIG(*)
    FROM dbo.test_transaction_item_detail AS i
    JOIN dbo.test_transaction_properties AS t ON
        t.transactionID = i.transactionID
    WHERE 
        $PARTITION.pf_test_transactionId(t.transactionID) = p.partition_number
        AND $PARTITION.pf_test_transactionId(i.transactionID) = p.partition_number
) AS SubTotals
OPTION (QUERYTRACEON 8649);

Now the query plan shows partition numbers from @P being distributed among threads on a round-robin basis. Each thread runs the inner side of the nested loops join for a single partition, achieving our goal of processing disjoint data concurrently. The same result is now returned in 3 seconds on my 8 hyper-cores, with all eight at 100% utilization.

Parallel APPLY

I am not recommending you use this technique necessarily - see my earlier warnings - but it does address your question.

See my article Improving Partitioned Table Join Performance for more details.

Columnstore

Seeing as you are using SQL Server 2012 (and assuming it is Enterprise) you also have the option of using a columnstore index. This shows the potential of batch mode hash joins where sufficient memory is available:

CREATE NONCLUSTERED COLUMNSTORE INDEX cs 
ON dbo.test_transaction_properties (transactionID);

CREATE NONCLUSTERED COLUMNSTORE INDEX cs 
ON dbo.test_transaction_item_detail (transactionID);

With these indexes in place the query...

SELECT
    COUNT_BIG(*)
FROM dbo.test_transaction_properties AS ttp
JOIN dbo.test_transaction_item_detail AS ttid ON
    ttid.transactionID = ttp.transactionID;

...results in the following execution plan from the optimizer without any trickery:

Columnstore plan 1

Correct results in 2 seconds, but eliminating the row-mode processing for the scalar aggregate helps even more:

SELECT
    COUNT_BIG(*)
FROM dbo.test_transaction_properties AS ttp
JOIN dbo.test_transaction_item_detail AS ttid ON
    ttid.transactionID = ttp.transactionID
GROUP BY
    ttp.transactionID % 1;

Optimized columnstore

The optimized column-store query runs in 851ms.

Geoff Patterson created the bug report Partition Wise Joins but it was closed as Won't Fix.

Best Answer

Related Solutions

Sql-server – Full-text index in partitioned table

Sql-server – SQL Server does not optimize parallel merge join on two equivalently partitioned tables

Columnstore

Related Question