I hate to state the obvious but I'd say test both scenarios and run your production queries against all 3 scenarios (scenario 1 being the current non partitioned one). The reason I state this is because I don't know what your code is querying. Do they actually have a benefit of being sorted in the base table by the identity column as opposed to the other column? For example, are your queries actually looking for the row ID often? If you're not sure, you might be surprised by getting some performance benefits. The general idea of using ID as the clustered key was for range scans, but in our case, we scan by customerID and Date, so it worked out perfectly for us, and perhaps you. Check out this article by Kim Tripp:
http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx
"What is often cited as the “reason” for IDENTITY PRIMARY KEY clustered index definitions is its monotonic nature, thus minimizing page splits. However, I argue that this is the only “reason” for defining the clustered index as such, and is the poorest reason in the list. Page Splits are managed by proper FILLFACTOR not increasing INSERTS. Range Scans are the most important “reason” when evaluating clustered index key definitions and IDENTITies do not solve this problem.Moreover, although clustering the IDENTITY surrogate key will minimize page splits and logical fragmentation due to its monotonic nature, it will not reduce EXTENT FRAGMENTATION, which can cause just as problematic query performance as page splitting.
In short, the argument runs shallow
"
Typically you want your clustered index to be as narrow as possible, unique, and non nullable as it get's carried into all other indexes. I have a table with roughly 10 billion rows and we partition off the datetime column, which has worked out great.
Which of the following primary keys (the column order is different) is preferred?
Like all indexing decisions, much depends on how the table will be queried.
All partitioned indexes (for SQL Server 2008 and later) have the partition ID (not partitioning key value) as a hidden leading key column in each partitioned index, so the effective competing definitions are:
PartitionID, K1, K2
vs.
PartitionID, K2, K1
This affects the utility of each index for different types of queries, as one would expect. The main extra consideration is that inequality seeks on the first real key (K1 or K2) are still supported, regardless of any inequality seek and/or partition elimination operations on the PartitionID column.
For example, the (K1, K2) index specification can seek to a range of partitions, and a range of K1 values simultaneously:
SELECT T1.*
FROM dbo.T1 AS T1
WHERE 1 = 1
AND T1.K1 >= CONVERT(date, '20080711', 112)
AND T1.K1 <= CONVERT(date, '20100711', 112);
The same query on a table with (K2, K1) as the clustered index key could seek find the range of partitions, but it would have to fully scan each qualifying partition to locate rows that match the K1 predicates exactly. To be clear, the test of K1 values would be applied as a residual predicate, not as a seeking operation.
This will show as a Clustered Index Scan in showplan, with a partition-eliminating seek, and residual predicate on K1 values:
A subtle point when using the date data type as the partitioning key is that you must be careful to use explicit date types in your queries if you expect partition elimination to occur reliably. Using other types, such as datetime, is easily done (by accident) but will often prevent elimination, where it would logically be expected.
For example, this query will touch all partitions:
DECLARE @dt datetime = '20080711';
SELECT *
FROM dbo.T1 AS T1
WHERE T1.K1 = @dt;
Whereas this query will touch only a single partition:
DECLARE @dt date = '20080711';
SELECT *
FROM dbo.T1 AS T1
WHERE T1.K1 = @dt;
Both queries look superficially identical in graphical showplan (a clustered index seek). You need to check the operator properties in detail to check if static or dynamic partition elimination is being applied.
For the example join query given in the question: Both indexing strategies include the K2 column, but neither can generally provide rows in K2 order without a sort. As a result, either index is equally good for a hash or nested loops join, but neither can provide the input order required for a merge join on K2.
This might seem counter-intuitive for the (K2, K1) index, but remember the leading PartitionID key. Each partition has rows in (K2, K1) order. Unless exactly one partition is specified in the query, a sort will be necessary to return rows in K2 order. The (K1, K2) index could only return rows in K2 order for a single partition and a single given value of K1.
The proposed clustered primary key (K1, K2) has the potential advantage of minimizing base table page splits if appended data is actually sorted by the clustering key during the insert operation. For the (K1, K2) index, this would mean rows sorted by (PartitionID, K1, K2). For (K2, K1), it would be (PartitionID, K2, K1).
Related reading: Query Processing Enhancements on Partitioned Tables and Indexes
Best Answer
You cannot have an unique constraints backed by an aligned index (or plain unique non-clustered indexes) unless you add the partitioning column to the the unique expression. So if you have partitioned your table on column
[datetime]
then your unique constraint (or the unique index) must be([datetime], [xyz])
. Since more often than not this is not acceptable, the alternative are to: