Sql-server – Large Fact table and partitioning key dilemma

clustered-indexnonclustered-indexpartitioningsql-server-2008-r2

I have fairly large fact table (2 billion records, approx 120 GB). This table is not partitioned and the queries are very slow to respond. I am planning to partition the table and indexes.
The table has an identity column which is the primary key and has a clustered index on it. There are other non-clustered indexes on it but I won't go in the details much here. The column, I am trying to partition is not part of primary key but is not null and this is creating a slight dilemma for me. I have two options.

  1. I add this column as part of primary key i.e. composite primary key. Since the first column is identity, the combination would always be unique which means I don't have to worry about the applications accessing the table. The clustered index will automatically be partition aligned and other indexes can also be partition aligned.

  2. The seconds option is to remove the clustered index on the identity column and make it unique non clustered. This index cannot be partitioned aligned since partition key is not part of it and hence would have to sit on one drive. Then create a clustered index on the partition key column which can be partition aligned and so all the other non clustered indexes.

Our DBA is in favour of second option since he doesn't want to change primary key. I am worried about the performance hit in option 2 since the index is not partition aligned.

I would appreciate any feedback plus any other method you would have used in such situation.

Best Answer

I hate to state the obvious but I'd say test both scenarios and run your production queries against all 3 scenarios (scenario 1 being the current non partitioned one). The reason I state this is because I don't know what your code is querying. Do they actually have a benefit of being sorted in the base table by the identity column as opposed to the other column? For example, are your queries actually looking for the row ID often? If you're not sure, you might be surprised by getting some performance benefits. The general idea of using ID as the clustered key was for range scans, but in our case, we scan by customerID and Date, so it worked out perfectly for us, and perhaps you. Check out this article by Kim Tripp: http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx

"What is often cited as the “reason” for IDENTITY PRIMARY KEY clustered index definitions is its monotonic nature, thus minimizing page splits. However, I argue that this is the only “reason” for defining the clustered index as such, and is the poorest reason in the list. Page Splits are managed by proper FILLFACTOR not increasing INSERTS. Range Scans are the most important “reason” when evaluating clustered index key definitions and IDENTITies do not solve this problem.Moreover, although clustering the IDENTITY surrogate key will minimize page splits and logical fragmentation due to its monotonic nature, it will not reduce EXTENT FRAGMENTATION, which can cause just as problematic query performance as page splitting. In short, the argument runs shallow "

Typically you want your clustered index to be as narrow as possible, unique, and non nullable as it get's carried into all other indexes. I have a table with roughly 10 billion rows and we partition off the datetime column, which has worked out great.