Sql-server – Clustered vs Non Clustered Indexes for a Primary Key

clustered-indexnonclustered-indexprimary-keysql serveruniqueidentifier

I have a large table with around 19 million records in it.

The primary key is a uniqueidentifier column called ID and there is a nonclustered index on this column. An application generates these GUID values.

The table currently has no clustered index on it.

I am considering changing from a nonclustered index to a clustered index, since the table currently has no clustered index. I am wondering what are the pros and cons of doing this or if I should?

Leave things as is
Find another column to create the clustered index on
Change this ID column over from a non clustered index to a clustered index.

We are getting about 50K inserts daily, with no deletes and no updates. Let me know if you need any additional details.

The table has 24 columns total and 5 nonclustered indexes without a clustered index:

Index 1- PK on ID (uniqueidentifier)
Index 2- TimeStamp (datetime)
Index 3- Hash Column (nvarchar(64))
Index 4- Covering index 4 columns
Index 5- Covering index 6 columns

I inherited the database from someone else. The application uses this GUID as a primary key with relationships to matching foreign keys in respective child tables so the queries use the uniqueidentifier id's in joins.

How do I determine if my GUID is generated sequentially? They don't appear similar. Some examples are as follows ordered by createddate:

24A689A0-831E-4670-9204-766256C58E43
ED001C69-11BA-430D-ACF9-3C3CADF3B400
768E5497-4F48-4C65-87D9-E9C27066708E

Doing a little research, it seems like a uniqueidentifier column is not a good candidate for a clustered index, so I'm wondering if I need to:

Look to a few columns to create a composite clustered index on these columns or
Live without a clustered index
Something else?

Let me know if I can provide any additional details. If anyone can give me some guidance as to what are some considerations for making this decision, I would appreciate that greatly.

Best Answer

Unless you switch to sequential GUID generation, you don't want the GUID as a clustered index key. So your choices are

1) Make the ID column the clustered index key. This will optimize joins and lookups on ID, but it sounds like those are rare.

2) For SQL 2016+ replace the Heap with a Clustered Columnstore index. This compresses the data and optimizes table scans, perhaps to the extent that you can drop some of your other indexes.

3) Pick your most used non-clustered index and make that the Clustered index.

4) Leave it as a Heap.

In any case, the "child tables" that use the GUID as a foreign key might benefit from having that foreign key column as the leading column in their clustered primary key.

Related Solutions

Sql-server – Clustered vs Nonclustered Index

Since we are talking about the clustered index, just because you defined the CI key column as ID, you still have the DeletedDate data in the leaf data pages of the index. That's the nature of the clustered index: It is the table data.

Because you are typically having queries that look like:

select *
from YourTable
where DeletedDate is null;

You will likely benefit from a filtered index.

create nonclustered index IX_YourFilteredNci
on YourTable(<Key Columns Here>)
where DeletedDate is not null;
go

I didn't explicitly put the key columns here (and nonkey columns through the use of the INCLUDE clause) because you didn't publish the DDL of your table.

As in my comment above to your question, the choice of key columns (not just columns, but also the order of the columns) will largely depend on your workload and the typical queries that would be using this index.

If you are looking to cover your query(ies), then you would need to ensure that the index satisfies all of the data required of the query(ies). Not to mention, if you have other WHERE clauses (besides your NULL check on DeletedDate) or joins to consider, then the order of your key columns can be the deciding factor between a scan or a seek. And even though it is filtered, and depending on how much data you have in the index, the penalty could be considerable.

Foreign Keys to tables where primary key is not clustered index

Taken from the MSDN article on FOREIGN KEY Constraints

A FOREIGN KEY constraint does not have to be linked only to a PRIMARY KEY constraint in another table; it can also be defined to reference the columns of a UNIQUE constraint in another table.

I think it is perfectly fine to define a UNIQUE CLUSTERED INDEX on an IDENTITY column and define the PRIMARY KEY on different columns, often with a NONCLUSTERED INDEX but you don't have to specify an index.

Best Answer

Related Solutions

Sql-server – Clustered vs Nonclustered Index

Foreign Keys to tables where primary key is not clustered index

Related Question