SQL Server – Uniquifier on Non-Unique Clustered Index vs. Unique Clustered Index

index-tuningsql-server-2008-r2uniqueidentifier

Lets say for example that I have a table structured this way:

ID1 int
ID2 int
ID3 int
DTM datetime

I'm only able to create a unique clustered index on the table if I include ID1, ID2, and ID3 (a combination of two of them or individually would lead to duplicate rows).

Would it be better performance wise to create a clustered index by ID1, ID2, and ID3? Or create a clustered index on just ID1 and have SQL Server add a uniquifier to the index to make each row unique?

I know that the uniquifier is 4 bytes in size (same for ints) so technically it would save 4 bytes per row if I make the latter index, but I'm not sure how this would affect my queries.

Best Answer

One of the most important considerations for a clustered index in terms of performance is that it be ever increasing (or decreasing) and not something that will be changed (except possibly very rarely). The clustered index represents the physical order of your table. If you are constantly inserting into the middle of the index, or modifying the values of your clustered index then you will get bad page splits where SQL has to move part of the data from one page and move it into another one in order to make room for the new data. These moves take time and cause fragmentation that degrades performance.

That being said the size of the clustered index, while important, is not your biggest concern. I did some experiments with using date columns (with a uniqueifier) vs an int column and found that if my queries were date based I still got a big bump in performance.

My suggestion to you is that if ID1, ID2 and ID3 are ever increasing (and rarely change) then use that as your clustered index. If not, and you still want to enforce uniqueness then either make them a non-clustered primary key or a non-clustered unique key. If they are not ever increasing then you can consider your date column or just create a surrogate key for the table. Use an int data type and make it an identity column.

If you do create a surrogate key then you can create a non-clustered index on ID1 to improve the performance on those queries assuming you didn't already create a unique index for it. If you frequently need to return, say ID2 and DTM then you could INCLUDE those in your index to additionally improve your performance.