Sql-server – Partition Function vs. Clustered Index

indexpartitioningsql serversql-server-2008-r2

We have a table that is clustered on identity/datetime2. It is partitioned on the same datetime2. Are there any reasons to cluster on datetime2/identity instead? I understand the reasoning behind clustering generally, but with partitioning included, do things change?

Best Answer

Partitioning a table only divides it into "chunks" based on the partition function. The clustered index will give order to the data within each partition.

If you're planning to run queries that involve parts of a partition (i.e., show me sales between Jan 5th and Jan 12th), then it can be advantageous to those queries to have the date as the leading column of the clustering key. This type of structure will result in clustered index seeks, instead of partition scans. (Assuming there are no other suitable indexes on the table.)

If the queries only touch entire partitions, it doesn't matter as much, as partition elimination is enough to isolate the needed data. That said, ordering by date first could eliminate the need for an expensive sort operation, depending on what you're doing.

But this also depends on which columns you need from the table. If you only need to do something like aggregate total sales amounts over a date range, it may be sufficient to use a covering nonclustered index with the date as the leading (or only) column of the key, instead of rebuilding the entire table just for that.

If you do change the clustered index, this will affect singleton lookups (probably by the identity column which I assume is the primary key) as they will now involve a nonclustered index seek + key lookup. If this type of activity isn't a major part of the workload, this will be fine, but you have to be really careful that not too many rows are selected by these queries, or the optimizer will revert to a partition scan on the assumption that it's cheaper. Again, depending on which columns you need, it may be advantageous to create a covering nonclustered index that includes only the columns you need.