Sql-server – SQL Server Primary key / clustered index design decision

database-designsql server

Looking for some advice regarding a table / index design decision I've got to make on some tables that I've got to port into SQL server from an existing 4GL based database.

I've got a product history table that is inserted into frequently (never updated) and the table has this kind of structure

  1. ProductNo String(20)
  2. CreatedDateTime DateTime
  3. Description String(100)

At the moment the primary key is made up of a combination of ProductNo and CreatedDateTime in an attempt to define a unique index key. We can have many records per productno.

I'll be creating some 1 to 1 related tables and don't want to carry both the productno and the createddatetime fields into the related tables to act as foriegn keys. I also think this combination is a little fragile in order to guarantee uniqueness.

So, I'm planning to add a new field to the table 'ProductHistoryPK' as an incrementing Int or SequentialGuid to act as the primary key and a foreign key to related tables.

In terms of indexes I'm thinking of creating

  1. Non-clustered primary key on the new ProductHistoryPK field.
  2. Clustered Index on the ProductNo field as this is field that is
    often searched on.

Any thoughts or pointers regarding this?

Thanks…

Best Answer

You are correct to separate "clustered index" from "primary key":

  • A clustered index is the organisation of data on disk is better if
    • narrow
    • numeric
    • increasing (strictly monotonic)
  • The primary key identifies a row

Note: GUIDs make poor clustering keys

In this case, with the surrogate column, the table has 2 candidate keys:

  • ProductHistoryID
  • ProductNo + CreatedDateTime

Assumed convention states that the ProductHistoryID becomes the PK, but you can leave the PK on (ProductNo, CreatedDateTime): it will just be non-clustered. Which leads to indexes:

  • clustered index should be on ProductHistoryID
  • unique non-clustered index on (ProductNo, CreatedDateTime)

Example

CREATE TABLE Product (
    ProductHistoryID int NOT NULL IDENTITY (1,1) NOT NULL,
    ProductNo ...
    CreatedDateTime ...

then you a choice of

    CONSTRAINT PK_Product PRIMARY KEY CLUSTERED (ProductHistoryID)
    CONSTRAINT UQ_Product UNIQUE NONCLUSTERED (ProductHistoryID)

or

    CONSTRAINT PK_Product PRIMARY KEY NONCLUSTERED (ProductNo, CreatedDateTime)
    CONSTRAINT PK_Product UNIQUE CLUSTERED (ProductHistoryID)

Also, the pattern you have is a "type 2 Slowly Changing Dimension"