Sql-server – How to index a composite primary key with non-sequential inserts

clustered-indexindexsql serversql-server-2008-r2

I have a composite primary key, comprising of a datetime, a location id and a 2 character code. This is the key that uniquely identifies each row of my table.

I know that the usage of this table will insert a week of data from a year ago, then a week of data from this year. Therefore, it's likely the table will get fragmented, as the primary key jumps back and forth.

I would like to know what sort of indexing strategies can be employed in this scenario. My gut feeling is that the default clustered index is a bad idea and I should replace that with a non-clustered index, which would make rebuilds easier.

Best Answer

If you are worried about fragmentation, do you have a maintenance solution on your server that addresses that such as SQL Server Maintenance Solution - Ola Hallengren?

Questions:

datetime data type is date, smalldatetime, datetime, datetime2(0-2),datetime2(3-4), datetime(5+) or?
LocationId data type is tinyint, smallint, int, bigint or ?
Character code data type is char(2), nchar(2) or?

More Questions:

How many rows are in your table?
Would you need an int or bigint for a surrogate id?
What are the other columns in your table?
Are all queries filtered by datetime, or are filtered by the locationid and/or character code?
Are you always returning the whole row, or sometimes just pieces of it?

Because most people assume the Primary Key is the Clustering Key, I am going to interpret your question as "How to index a composite clustering key with non-sequential inserts?".

The situation you are considering is related to debates like The Clustered Index Debate and Surrogate Key vs. Natural Key.

In this situation, I would want to consider the impact of a [16?] byte composite clustering key vs a 4 or 8 byte clustering key (int / bigint). My decision tree would look something like this:

Will we be using Hekaton (In-Memory OLTP)?
- Yes => composite key. Run away.
- No => Good call, continue...
How many rows will this table have?
- Tens of millions, maybe more! => surrogate key (probably).
  - If the data length of each row will be variable and not narrow => surrogate key.
  - If the data length of the row will be fixed and narrow, and it results in optimal page usage => continue...
- Less than that => continue...
How will the table be queried?
- Various combinations of and not always all of datetime, location_id, character_code => surrogate key.
  - In this case you may want to be able to have multiple supporting indexes the combinations of datetime, location_id, character_code for your queries. The clustering key is included in all non-clustered indexes, and the larger it is the more space/pages each index entry will require. => surrogate key.
- Almost always by all three datetime, location_id, character_code or almost always by datetime or datetime, location_id; but not by location_id without datetime and not character_code without datetime, location_id (zero or only a couple of non-clustered indexes on this table) => continue...
Will any other table reference this table?
- Yes => surrogate key.
- No => composite key is a reasonable option

Even if my first run through of the above decision tree leads me to a composite key, I would probably start my design using a surrogate key because it is easier to get rid of it (because it isn't being used) than to go back and add it and implement its use.

Just to clarify, I have had cases where I did find that the composite key was a better solution and did refactor the design to drop the surrogate key. I don't want to leave the impression that the surrogate key is always the better solution, even if it is a common default for many designers (including myself).

Reference:

Best Answer

Related Solutions

Sql-server – Replacing composite key with surrogate

SQL Server Disk Space – Considerations for Rebuilding Large Clustered Primary Key Index

Related Question