Sql-server – Clustered Primary Key that is never used vs. Non-Clustered Primary Key on Multiple Columns

sql servertable

I am working on a table design for Customer Totals and trying to make a decision about the primary key. I was going to go with a surrogate identity column with a clustered index, but this column would NEVER be used. The candidate primary key columns are CustomerNumber + AccountNumber, because these are the unique identifiers for each row, but these will NOT be sequentially inserted.

Basically, on a daily basis a report will be run which will update each CustomerNumber + AccountNumber record with the most recent purchase total and total date.

Does it make sense to remove CustomerTotalID completely and have CustomerNumber + AccountNumber be a PK with a NON-clustered index?

    CREATE TABLE CustomerTotals (
    CustomerTotalID INT IDENTITY(1,1),
        CustomerNumber INT,
        AccountNumber INT,
        PurchaseTotal DECIMAL(10,2)
        TotalDate DATE,
      CONSTRAINT [PK_CustomerTotals] PRIMARY CLUSTERED (
    CustomerTotalID ASC
)
    )

Best Answer

I'm working on a similar problem right now except more columns and millions of rows. We have a PK that isn't used in any queries. We ran a preliminary test where we changed it to a non-clustered PK and I found two columns that are used in where clauses and created a clustered index on those.

Many queries ran faster and out of more than two dozen indexes on the table we think we will be able to delete 25 of them.

In theory what were are looking to do is not best practices because the columns aren't unique that we're looking to use for the clustered index. But in practice it allows us to save a lot of space by making many non-clustered indexes unneeded and improve I/O by deleting these indexes.

Related Solutions

Sql-server – Is ‘Avoid creating a clustered index based on an incrementing key’ a theth from SQL Server 2000 days

The myth goes back to before SQL Server 6.5, which added row level locking. And hinted at here by Kalen Delaney.

It was to do with "hot spots" of data page usage and the fact that a whole 2k page (SQL Server 7 and higher use 8k pages) was locked, rather then an inserted row Edit, Feb 2012

Found authoritative article by Kimberly L. Tripp

"The Clustered Index Debate Continues..."

Hotspots were something that we greatly tried to avoid PRIOR to SQL Server 7.0 because of page level locking (and this is where the term hot spot became a negative term). In fact, it doesn't have to be a negative term. However, since the storage engine was rearchitected/redesigned (in SQL Server 7.0) and now includes true row level locking, this motivation (to avoid hotspots) is no longer there.

Edit, May 2013

The link in lucky7_2000's answer seems to say that hotspots can exist and they cause issues. However, the article uses a non-unique clustered index on TranTime. This requires a uniquifier to be added. Which means the index in not strictly monotonically increasing (and too wide). The link in that answer does not contradict this answer or my links

On a personal level, I have woked on databases where I inserted tens of thousands of rows per second into a table that has a bigint IDENTITY column as the clustered PK.

Sql-server – SQL Server Primary key / clustered index design decision

You are correct to separate "clustered index" from "primary key":

A clustered index is the organisation of data on disk is better if
- narrow
- numeric
- increasing (strictly monotonic)
The primary key identifies a row

Note: GUIDs make poor clustering keys

In this case, with the surrogate column, the table has 2 candidate keys:

ProductHistoryID
ProductNo + CreatedDateTime

Assumed convention states that the ProductHistoryID becomes the PK, but you can leave the PK on (ProductNo, CreatedDateTime): it will just be non-clustered. Which leads to indexes:

clustered index should be on ProductHistoryID
unique non-clustered index on (ProductNo, CreatedDateTime)

Example

CREATE TABLE Product (
    ProductHistoryID int NOT NULL IDENTITY (1,1) NOT NULL,
    ProductNo ...
    CreatedDateTime ...

then you a choice of

    CONSTRAINT PK_Product PRIMARY KEY CLUSTERED (ProductHistoryID)
    CONSTRAINT UQ_Product UNIQUE NONCLUSTERED (ProductHistoryID)

    CONSTRAINT PK_Product PRIMARY KEY NONCLUSTERED (ProductNo, CreatedDateTime)
    CONSTRAINT PK_Product UNIQUE CLUSTERED (ProductHistoryID)

Also, the pattern you have is a "type 2 Slowly Changing Dimension"

Best Answer

Related Solutions

Sql-server – Is ‘Avoid creating a clustered index based on an incrementing key’ a theth from SQL Server 2000 days

Sql-server – SQL Server Primary key / clustered index design decision

Related Question