Sql-server – Clustered Index Selection – PK or FK

clustered-indexsql server

I have a SQL Server 2014 table that looks like the following:

OrderId     int           not null IDENTITY --this is the primary key column
OrderDate   datetime2     not null
CustomerId  int           not null
Description nvarchar(255) null

Some folks on my team have suggested that the clustered index should be on OrderId, but I think that the CustomerId + OrderId would be a better choice for the following reasons:

  • Almost all queries will be looking WHERE CustomerId = @param, not OrderId
  • CustomerId is a foreign key to the Customer table, so having a
    clustered index with CustomerId should speed up joins
  • While CustomerId isn't unique, having the additional OrderId column
    specified in the index will ensure uniqueness (We can use the UNIQUE keyword when creating the clustered index on those 2 columns, to avoid the overhead of not having uniqueness)
  • Once data is inserted, the CustomerId and OrderId never change, so these rows wouldn't be moving around after initial write.
  • Data access happens via an ORM that requests all columns by default, so when a query based on CustomerId comes in, the clustered index will be able to provide all columns without any additional work.

Does the CustomerId and OrderId approach sound like the best option given the above? Or, is OrderId on its own better, since it's a single column that's guaranteeing uniqueness by itself?

Currently, the table has a clustered index on OrderId, and a nonclustered index on CustomerId, but it's not covering, so since we're using an ORM and all columns are requested, it's extra work to retrieve them. So with this post, I'm trying to consider improving performance with a better CI.

The activity on our DB is about 85% reads and 15% writes.

Best Answer

Community wiki answer:

I think a composite clustered index key with CustomerID as the first column will be best since that's in the WHERE clause of nearly all queries.

There may be more splits compared to an incremental key (or more likely suboptimal page density for a time if you manage and maintain fill factor to avoid 'bad' splits). However, the overall performance improvement for customer queries is substantial, because the key lookup is avoided.

OrderID or OrderDate may be best for the second column depending on your most critical queries.

For example, if customers see a chronological list of recent orders after logging in to a web site, OrderDate should be next, to optimize ORDER BY OrderDate DESC.

If you choose OrderID as the clustered index, with a non-clustered index on CustomerID, you'll still get splits and fragmentation, just in the non-clustered index.