Sql-server – Query against partitioned table accessing more partitions than necessary

partitioningsql-server-2017

I first posted this on SO but got no replies, this is the reason for why I'm trying here now and I hope I'm not breaking any rules in doing so (fingers crossed).

I'm trying to understand how to use table partitioning and partition elimination correctly so that I can decide if I want to use it or not in a new database (sql server 17 express). I'm leaning towards not using it but that's a different story so this question is more that I wan't to know how it works now.
In the following script I create a partitioning function and scheme, a table using the scheme, insert some data and then query it in 3 different ways.

  • The first query is correctly only accessing the first partition (actual partition count = 1, actual partitions accessed = 1).

  • I expect the second query to only access the first partition as well given that I've created the partition function as range right on the value 20200101 but it is accessing partition 1 and 2 (actual partition count = 2, actual partitions accessed = 1..2).

  • I expect the third query to only access partition 2 but it is accessing partition 2 and 3 (actual partition count = 2, actual partitions accessed = 2..3).

I'm aware that I need to have exact type matching for partition elimination to occur but I think I have that here?
What am I missing?

create partition function partfun (datetime2(0))
as range right for values ('20200101', '20210101')

create partition scheme ps
as partition partfun all to ([primary])

create table t
(
    id int not null identity(1,1),
    logdate datetime2(0)
) on ps(logdate)
insert into t values ('20190101'),('20200102')

select * from t where logdate < cast('20191231' as datetime2(0))
select * from t where logdate < cast('20200101' as datetime2(0))
select * from t where logdate >= cast('20200101' as datetime2(0)) and t.logdate < cast('20210101' as datetime2(0))

Best Answer

These trivial queries are automatically parameterized. As such, the literal values are replaced with parameters during compilation. Partitions are eliminated at runtime dynamically using the internal RangePartitonNew function in a range seek predicate. Importantly, the seek predicate start/end range is inclusive so that the same cached plan can be reused for any possible value, not just exact boundary matches as in your queries. Rows are subsequently filtered based on the WHERE clause predicate to return only the rows desired.

Below is the seek predicate from the first 2 queries. At run time, the first query seek is equivilent to partition_number >= 1 AND partition_number <= 1 and the second is partition_number >= 1 AND partition_number <= 2. Partition 1 is hard-coded as the start range due to the less than operator.

Seek Keys[1]: 
    Start: PtnId1001 >= Scalar Operator((1)),
    End: PtnId1001 <= Scalar Operator(RangePartitionNew(CONVERT(datetime2(0),[@1],0),(1),'2020-01-01 00:00:00','2021-01-01 00:00:00'))

The seek predicate from the third query differs only in that the start range also includes RangePartitionNew to eliminate partitions excluded by the greater than or equal to comparison operator. This seek is equivilent to predicate partition_number >= 2 AND partition_number <= 3.

Seek Keys[1]: 
    Start: PtnId1001 >= Scalar Operator(RangePartitionNew(CONVERT(datetime2(0),[@1],0),(1),'2020-01-01 00:00:00','2021-01-01 00:00:00')),
        End: PtnId1001 <= Scalar Operator(RangePartitionNew(CONVERT(datetime2(0),[@2],0),(1),'2020-01-01 00:00:00','2021-01-01 00:00:00'))

You can avoid simple parameterization with these techniques from Paul's blog article:

  • OPTION (RECOMPILE)
  • Add a dummy predicate like WHERE 1 <> 2

If exact boundary values are your only use case, you could add a predicate to evaluate the partition function with a less than comparison operator. This will retain parameterization benefits. For example:

AND $PARTITION.partfun(logdate) < $PARTITION.partfun('20200101')

The resultant parameterized plan seek predicate subtracts one from the end of the boundary range to eliminate the partition excluded by the above predicate:

Seek Keys[1]: 
    Start: PtnId1001 >= Scalar Operator((1)), 
    End: PtnId1001 <= Scalar Operator(RangePartitionNew(CONVERT_IMPLICIT(datetime2(0),[@2],0),(1),'2020-01-01 00:00:00','2021-01-01 00:00:00')-(1))

Note that an index on the partitioning column will mitigate the need for these work-arounds. Even though an extra partition will be touched with exact boundary values, the overhead is only a couple of additional logical reads.