Recently I was reading this blog post:
http://blogs.msdn.com/b/sqlazure/archive/2010/05/05/10007304.aspx
which contains this section:
Picking a Clustered Index
There are a couple of strategies for picking
your clustered indexes; one of the easiest and best is to add another
column of data type datetime and use that column for your clustered
index. Here is what you need to do:
Add the column as data type datetime
I usually call it Date
Set the Default Value to GetDate().
Make it non-null.
Create your clustered index on it before you insert data into you.
My question is would this ever create two identical values for the Date? Does this answer change if parallelism is in use? (Assume value never specified, always comes from GetDate())
I believe I'm correct in assuming it wouldn't matter due to the behind-the-scenes uniqueifier being added, right? But I'm interested anyway.
I'm asking from a SQL2008R2 perspective but would be interested if the answer differs for any version of SQL Server from 7.0 up.
Best Answer
GETDATE()
is not guaranteed to be unique, no. Especially if it is a datetime, where milliseconds are rounded up or down, and of course when not all data is coming from the same user, you are almost guaranteed to have collisions.Of course, a clustered index does not need to be unique, since SQL Server will make it so if it isn't (but only when it needs to). If you need to identify a specific row yourself (as opposed to just the uniquifier for SQL Server's internal usage), and there are no other candidate key columns (this is possible with things like event logging tables), you could add a non-clustered primary key that is, say, an IDENTITY column. Or if you really want web scale - and care more about insert performance than storage or any subsequent use of the data - you could use a uniqueidentifier column populated with NEWID().
Example
Let's take a look at an example, and see the differences.
Insert speed
I populated all three tables with about 500,000 rows, using the following script:
Results:
Scan Speed
Results:
Space used
Looking at simple results from
sp_spaceused
:Results:
The uniquifier takes up less space than the IDENTITY column (and obviously both take less than the GUID), since it is only needed for collisions (and probably other optimizations I'm not aware of, such as compression).
We can also look at the index pages for the clustered index on the datetime column (index id = 1) and the non-clustered primary key (index id = 2):
Page/Row Contents
And finally, we can take a look at a specific page to see what is stored in a row. I simply took the first row from each of the clustered index DBCC IND results from above (your page id values will almost certainly be different):
Uniquifier - pay particular attention to length/record size:
IDENTITY seems to have 4 extra bytes in the clustered index:
GUID has an extra 16 bytes in the clustered index:
Conclusion
Seems to me, by all accounts, you are better off letting the uniquifier do its own thing (assuming you don't expect to be able to differentiate between two rows with the exact same date/time value). The only time you can get in trouble with this, if any single value is duplicated 2,147,483,648 times, at which point you will overflow the uniquifier range of integer.