Scenario
Partitioned table is empty and I am loading data for 1 partition which has 180k rows. I disabled the indexes and loaded the data and rebuilt the indexes after data is loaded.
Issue
While examining the query plan of rebuilt indexes, I can see the 'Estimated number of rows' is 180k but 'Actual number of rows' is 300 partitions * 180,000 rows = 54 Million rows, even though I am loading data for only one partition.
Can you shed some light on this behavior and how to overcome this issue?
Microsoft SQL Server 2016 (SP2) (KB4052908) – 13.0.5026.0 (X64) Mar 18 2018 09:11:49 Copyright (c) Microsoft Corporation Enterprise Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 (Build 9600: )
Best Answer
Your maths are a little off here. The image provided shows an estimated 1,853,250 rows (not 180k) and a total of 674,582,272 rows (not 54 million) over 364 iterations (364 * 1,853,250 = 674,582,272).
Still, the question remains: Why would SQL Server read 674 million rows to rebuild a nonclustered index when the whole table only holds 1,853,250 rows?
The execution plan shown is a colocated join. It performs the following operations:
This is obviously a terribly inefficient way to go about things. But the optimizer is not crazy: the general idea is sound, the problem is that the Filter is applied too late. Normally, the 'current partition' predicate would be pushed down into the Clustered Index Scan, where it would appear as a seek to select the current partition.
How It Should Work
The usual parallel plan generated looks like this:
Notice the lack of a Filter operator. Its predicate has been pushed into the scan:
Yes, this is a scan with a seek. The idea is to fetch rows from only the current partition on each iteration of the loop, so each iteration builds one partition of the nonclustered index.
Your case
I managed to reproduce the execution plan shown in the question by using an
float
orbigint
data type as the partitioning column. Some aspect of the cardinality estimation for the Filter (probably a guess) prevents the optimizer pushing the 'current partition' predicate past the Sort, and into the Clustered Index Scan.The following demo creates a 300-partition table with a disabled nonclustered index, loads one million rows into a single partition, then rebuilds the nonclustered index. It uses the
bigint
data type:The index rebuild plan takes around 90 seconds to run:
By contrast, when the partitioning column is an
integer
data type, the rebuild takes two seconds (with the efficient scan-with-a-seek plan previously shown).This might be because the partition id is natively an
integer
(so no conversion is necessary in the filter predicate), but the details are unclear at this point.A workaround
The above only reproduces for me if I use the default (modern) cardinality estimation model. It is not possible to specify a hint directly on an index rebuild statement, so the following uses documented trace flag 9481 to temporarily use the legacy (original) cardinality estimation model instead:
This produces the optimal execution plan and completes in around two seconds.
The real solution
There is a better way to add or load data into a single partition of a table. The idea is to
SWITCH
the existing partition out into a standalone table, load the data into that, thenSWITCH
the partition back in:Partition switching is a metadata-only operation, and typically completes instantly.
The implementation above loads the one million new rows in less than two seconds. Note there is no need to disable the nonclustered index on the main table.