SQL Server – Implications of Removing Unnecessary Columns from Primary Key

ddlprimary-keysql server

I have a database in which a number of tables are defined with multiple UUID columns forming the Primary Key when they should instead be defined as Foreign Keys, e.g.

CREATE TABLE [dbo].[SomeTable](
    [SomeTableID] [uniqueidentifier] NOT NULL,
    [WorkEntryID] [uniqueidentifier] NOT NULL,
    [TimeEntryID] [uniqueidentifier] NOT NULL,
    [RateEntryID] [uniqueidentifier] NOT NULL,
    ...

 CONSTRAINT [PK_SomeTable] PRIMARY KEY CLUSTERED 
 (
    [SomeTableID] ASC,
    [WorkEntryID] ASC,
    [TimeEntryID] ASC,
    [RateEntryID] ASC
 )
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[SomeTable] ADD  CONSTRAINT [DF_SomeTable_SomeTableID]  DEFAULT (newsequentialid()) FOR [SomeTableID]
GO

The additional columns should have been defined as Foreign Keys and in some cases have been. I'm assuming there are performance implications of having the additional columns as part of the PK? Are there any considerations I should be aware of before redefining the PK for the tables to include only the PK column?

Best Answer

The ideal clustered index in SQL Server meets the following criteria:

  1. Unique
  2. Never changes
  3. New values are strictly ascending
  4. Narrow
  5. Queries against the table run faster because they can take advantage of range seeks or the ordering of the clustered index

In production you may not be able to pick a clustered index which meets all of those criteria, which is fine. You can only do the best that you can do. A clustered index of four GUIDs is an uncommon choice. I assume that it will meet the first two criteria but not the other three. One performance impact of the current index is that nonclustered indexes will take up additional disk space because they each store a copy of the clustered index. That's why clustered indexes are ideally narrow.

Based on the information provided in this question, you will likely be better off by changing the clustered index to just be the SomeTableID column. I'm making some assumptions here, but it's unlikely that you'll see performance degradation caused by that switch.

This answer addressed clustered indexes and not primary keys because the clustered index is the thing that matters for performance, which seems to be the focus of this question. Primary keys are to enforce data integrity and business constraints. Continuing to match the columns for the clustered index and the primary key should be fine in this case unless there is a business reason to keep the current primary key columns.