A partitioned table is really more like a collection of individual tables stitched together. So your in example of clustering by IncidentKey
and partition by IncidentDate
, say that the partitioning function splits the tables into two partitions so that 1/1/2010 is in partition 1 and 7/1/2010 is partition two. The data will be layed out on disk as:
Partition 1:
IncidentKey Date
ABC123 1/1/2010
ABC123 1/1/2011
XYZ999 1/1/2010
Partition 2:
IncidentKey Date
ABC123 7/1/2010
XYZ999 7/1/2010
At a low level there really are two, distinct rowsets. Is the query processor that gives the illusion of a single table by creating plans that seek, scan and update all rowsets together, as one.
Any row in any non-clustered index will have have the clustered index key to which it corresponds, say ABC123,7/1/2010
. Since the clustered index key always contains the partitioning key column, the engine will always know in what partition (rowset) of the clustered index to search for this value (in this case, in partition 2).
Now whenever you're dealing with partitioning you must consider if your NC indexes will be aligned (NC index is partitioned exactly the same as the clustered index) or non-aligned (NC index is non-partitioned, or partitioned differently from clustered index). Non-aligned indexes are more flexible, but they have some drawbacks:
Using aligned indexes solves these issues, but brings its own set of problems, because this physical, storage design, option ripples into the data model:
- aligned indexes mean unique constrains can no longer be created/enforced (except for the partitioning column)
- all foreign keys referencing the partitioned table must include the partitioning key in the relation (since the partitioning key is, due to alignment, in every index), and this in turn requires that all tables referencing the partitioned table contain partitioning key column value. Think Orders->OrderDetails, if Orders have OrderID but is partitioned by OrderDate, then OrderDetails must contain not only OrderID, but also OrderDate, in order to properly declare the foreign key constraint.
These effects I found seldom called out at the beginning of a project that deploys partitioning, but they exists and have serious consequences.
If you think aligned indexes are a rare or extreme case, then consider this: in many cases the cornerstone of ETL and partitioning solutions is the fast switch in of staging tables. Switch in operations require aligned indexes.
Oh, one more thing: all my argument about foreign keys and the ripple effect of adding the partitioning column value to other tables applies equally to joins.
Yes, it is redundant.
On the other hand, it won't hurt anything (since the optimizer will ignore it and not add the data twice).
This is one of the many reasons not to take the suggested indexes at face value.
Best Answer
Short answer:
If the clustering key would not be part of the key set then the index uniqueness across all partitions could not be guaranteed.
The long story:
The only way to guarantee uniqueness is by enforcing uniqueness via a B-Tree structure. With a B-Tree is really easy to enforce uniqueness, since any key has a deterministic logical position in the B-Tree (between its left neighbor, that is smaller key, and the right neighbor, which is a larger key). Since each key value can be inserted in a single position in the tree then uniqueness is easy to guarantee: simply check during insert if the position is already occupied by an existing row.
With partitioning uniqueness across all partitions is trickier, since there are many B-Tree structures involved (one for each partition). Each can guarantee uniqueness, within a partition, but together they cannot guarantee global uniqueness across all partitions: a key in partition X may also appear in partition Y. Therefore, in order to preserve the semantics of 'unique index', the clustering column must be part of the key. This way each key value deterministically bound to a specific partition: a key cannot appear in partition X and Y because the partitioning column is part of the key, therefore the key can belong to exactly partition X or Y.
@MarkStoreySmith asked: So why doesn't SQL Server check for uniqueness by navigating each of the B-tree structures, acquiring range locks where the value being tested is not present, and holding until each B-tree has been traversed? Seeking each B-Tree would be a performance hog (think 15,000 partitions to be checked at each insert). And also the locking complications (hopping from one B-Tree to the next) are significant, with huge potential for deadlocks.