Sql-server – How to index a composite primary key with non-sequential inserts

clustered-indexindexsql serversql-server-2008-r2

I have a composite primary key, comprising of a datetime, a location id and a 2 character code. This is the key that uniquely identifies each row of my table.

I know that the usage of this table will insert a week of data from a year ago, then a week of data from this year. Therefore, it's likely the table will get fragmented, as the primary key jumps back and forth.

I would like to know what sort of indexing strategies can be employed in this scenario. My gut feeling is that the default clustered index is a bad idea and I should replace that with a non-clustered index, which would make rebuilds easier.

Best Answer

If you are worried about fragmentation, do you have a maintenance solution on your server that addresses that such as SQL Server Maintenance Solution - Ola Hallengren?


Questions:

  • datetime data type is date, smalldatetime, datetime, datetime2(0-2),datetime2(3-4), datetime(5+) or?
  • LocationId data type is tinyint, smallint, int, bigint or ?
  • Character code data type is char(2), nchar(2) or?

More Questions:

  • How many rows are in your table?
  • Would you need an int or bigint for a surrogate id?
  • What are the other columns in your table?
  • Are all queries filtered by datetime, or are filtered by the locationid and/or character code?
  • Are you always returning the whole row, or sometimes just pieces of it?

Because most people assume the Primary Key is the Clustering Key, I am going to interpret your question as "How to index a composite clustering key with non-sequential inserts?".

The situation you are considering is related to debates like The Clustered Index Debate and Surrogate Key vs. Natural Key.

In this situation, I would want to consider the impact of a [16?] byte composite clustering key vs a 4 or 8 byte clustering key (int / bigint). My decision tree would look something like this:

  1. Will we be using Hekaton (In-Memory OLTP)?

    • Yes => composite key. Run away.

    • No => Good call, continue...

  2. How many rows will this table have?

    • Tens of millions, maybe more! => surrogate key (probably).

      • If the data length of each row will be variable and not narrow => surrogate key.

      • If the data length of the row will be fixed and narrow, and it results in optimal page usage => continue...

    • Less than that => continue...

  3. How will the table be queried?

    • Various combinations of and not always all of datetime, location_id, character_code => surrogate key.

      • In this case you may want to be able to have multiple supporting indexes the combinations of datetime, location_id, character_code for your queries. The clustering key is included in all non-clustered indexes, and the larger it is the more space/pages each index entry will require. => surrogate key.
    • Almost always by all three datetime, location_id, character_code or almost always by datetime or datetime, location_id; but not by location_id without datetime and not character_code without datetime, location_id (zero or only a couple of non-clustered indexes on this table) => continue...

  4. Will any other table reference this table?

    • Yes => surrogate key.

    • No => composite key is a reasonable option


Even if my first run through of the above decision tree leads me to a composite key, I would probably start my design using a surrogate key because it is easier to get rid of it (because it isn't being used) than to go back and add it and implement its use.

Just to clarify, I have had cases where I did find that the composite key was a better solution and did refactor the design to drop the surrogate key. I don't want to leave the impression that the surrogate key is always the better solution, even if it is a common default for many designers (including myself).


Reference: