I have a composite primary key, comprising of a datetime, a location id and a 2 character code. This is the key that uniquely identifies each row of my table.
I know that the usage of this table will insert a week of data from a year ago, then a week of data from this year. Therefore, it's likely the table will get fragmented, as the primary key jumps back and forth.
I would like to know what sort of indexing strategies can be employed in this scenario. My gut feeling is that the default clustered index is a bad idea and I should replace that with a non-clustered index, which would make rebuilds easier.
Best Answer
If you are worried about fragmentation, do you have a maintenance solution on your server that addresses that such as SQL Server Maintenance Solution - Ola Hallengren?
Questions:
More Questions:
Because most people assume the Primary Key is the Clustering Key, I am going to interpret your question as "How to index a composite clustering key with non-sequential inserts?".
The situation you are considering is related to debates like The Clustered Index Debate and Surrogate Key vs. Natural Key.
In this situation, I would want to consider the impact of a [16?] byte composite clustering key vs a 4 or 8 byte clustering key (int / bigint). My decision tree would look something like this:
Will we be using Hekaton (In-Memory OLTP)?
Yes => composite key. Run away.
No => Good call, continue...
How many rows will this table have?
Tens of millions, maybe more! => surrogate key (probably).
If the data length of each row will be variable and not narrow => surrogate key.
If the data length of the row will be fixed and narrow, and it results in optimal page usage => continue...
Less than that => continue...
How will the table be queried?
Various combinations of and not always all of
datetime, location_id, character_code
=> surrogate key.datetime, location_id, character_code
for your queries. The clustering key is included in all non-clustered indexes, and the larger it is the more space/pages each index entry will require. => surrogate key.Almost always by all three
datetime, location_id, character_code
or almost always bydatetime
ordatetime, location_id
; but not bylocation_id
withoutdatetime
and notcharacter_code
withoutdatetime, location_id
(zero or only a couple of non-clustered indexes on this table) => continue...Will any other table reference this table?
Yes => surrogate key.
No => composite key is a reasonable option
Even if my first run through of the above decision tree leads me to a composite key, I would probably start my design using a surrogate key because it is easier to get rid of it (because it isn't being used) than to go back and add it and implement its use.
Just to clarify, I have had cases where I did find that the composite key was a better solution and did refactor the design to drop the surrogate key. I don't want to leave the impression that the surrogate key is always the better solution, even if it is a common default for many designers (including myself).
Reference: