I hate to state the obvious but I'd say test both scenarios and run your production queries against all 3 scenarios (scenario 1 being the current non partitioned one). The reason I state this is because I don't know what your code is querying. Do they actually have a benefit of being sorted in the base table by the identity column as opposed to the other column? For example, are your queries actually looking for the row ID often? If you're not sure, you might be surprised by getting some performance benefits. The general idea of using ID as the clustered key was for range scans, but in our case, we scan by customerID and Date, so it worked out perfectly for us, and perhaps you. Check out this article by Kim Tripp:
http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx
"What is often cited as the “reason” for IDENTITY PRIMARY KEY clustered index definitions is its monotonic nature, thus minimizing page splits. However, I argue that this is the only “reason” for defining the clustered index as such, and is the poorest reason in the list. Page Splits are managed by proper FILLFACTOR not increasing INSERTS. Range Scans are the most important “reason” when evaluating clustered index key definitions and IDENTITies do not solve this problem.Moreover, although clustering the IDENTITY surrogate key will minimize page splits and logical fragmentation due to its monotonic nature, it will not reduce EXTENT FRAGMENTATION, which can cause just as problematic query performance as page splitting.
In short, the argument runs shallow
"
Typically you want your clustered index to be as narrow as possible, unique, and non nullable as it get's carried into all other indexes. I have a table with roughly 10 billion rows and we partition off the datetime column, which has worked out great.
I found that it is a new feature (don't know what is use of it)
Before SQL Server 2012, identity allocations were always individually logged (as each value was used). This per-row logging activity could limit throughput in scenarios where many identity values are generated in a short space of time. To improve efficiency, SQL Server 2012 (and later) logs only the allocation of a batch of identity values. The allocated range is cached and issued on demand until a new batch of values is needed.
If SQL Server is restarted without the database containing the identity object being checkpointed on shut down, any remaining unused values in the cached range are lost, resulting in a jump in the values on restart.
Unfortunately, common ways of shutting down SQL Server 2012 do not currently automatically checkpoint databases (this contradicts the documentation, so it should be fixed at some point in the future). To avoid this particular cause of jumps in assigned identity values on SQL Server 2012, always shut down SQL Server using the T-SQL
command SHUTDOWN
(without the NOWAIT
option).
The SHUTDOWN
command will checkpoint all user databases correctly before shutting down the server. Do not use the Windows Service Control application, SQL Server Configuration Manager, the SQL Server Management Studio UI, or any other method.
You could also manually CHECKPOINT
all databases before shutting down the SQL Server using any other method, but this requires you ensure no identity-allocation activity occurs in any database after your checkpoints, and before the shutdown completes. This may not be easy to achieve reliably.
More information and background in Lost Identity by Kalen Delaney.
Trace flag 272 is now documented in DBCC TRACEON - Trace Flags (Transact-SQL):
Disables identity pre-allocation to avoid gaps in the values of an identity column in cases where the server restarts unexpectedly or fails over to a secondary server. Note that identity caching is used to improve INSERT performance on tables with identity columns.
Note: Starting with SQL Server 2017, to accomplish this at the database level, see the IDENTITY_CACHE option in ALTER DATABASE SCOPED CONFIGURATION (Transact-SQL).
Scope: global only
Best Answer
Assuming the index is 100% full the first few inserts at the new key value will cause one or two page splits, where half the rows on the full page are copied to the new page, and then the new row will be inserted into one of the half-full pages.
However after that you resume low-cost inserts and page allocations. When the page fills up a new, empty page is allocated and inserted into the doubly-linked list of leaf pages. No rows are copied to the new page, and the new row is written to the new page.
The only difference between this and an end-of-index insert is the presence of a next page after the new page. And the next page must be updated with a reverse page pointer to the new page.
See this post for details: Good Page Splits and Sequential GUID Key Generation
No.
As SQL Continues inserting into the "middle" of the index, when the target page for an insert is full, SQL will always allocate a new page. It won't put the row on the next existing page, even if that page is not full. The non-leaf levels of the clustered index store the starting value for each leaf-level page, so SQL can't insert the new row onto the next existing page without rewriting the non-leaf level of the index.
You can see this behavior easily by forcing page locks and looking at the lock footprint of your inserts (or less easily with DBCC PAGE). In the example below a clustered index table that can fit exactly 4 rows on a page take a series of mid-index inserts.
So now we have page 328 is full and the next page 332 has only 3 rows. But the lowest value on 332 is 11. So when we insert the value 4, does go on 332, which will cause another bad page split when 5 is inserted?
So another "bad page split", but this one only moved the row with id=10. And leaves page 328 full with values 1,2,3,4
Now all the larger key values are on seperate pages, and from here on out there are no more bad page splits. The next insert requires a new page, and only the new row is on the new page:
The new page 334, has only the last-inserted row.
So after a couple of initial "bad page splits" that required moving rows, subsequently new pages are allocated and spliced into the middle of doubly-linked list of leaf pages. The new pages are allocated when a new row wont fit on the previous page and no rows are migrated to the new page.