I have come across what looks to me like a slightly odd pattern in a SQL Server 2005 database I'm taking care of, and was wondering whether it's just me, or whether it really is odd.
There are a number of tables with uniqueidentifier
primary keys, which also have a computed column which is the CHECKSUM
of that key, e.g.
[CustomerGuid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[CustomerHash] AS (CHECKSUM([CustomerGuid])) PERSISTED,
Then, there are indexes which contain both of these fields, e.g.
CREATE NONCLUSTERED INDEX [IX_Customer_CustomerHashAndGuid] ON [dbo].[Customer]
(
[CustomerHash] ASC,
[CustomerGuid] ASC
)
This pattern also pops up with Guids that are not primary keys – e.g., an Order
table with CustomerGuid
and CustomerHash
for each order, and an index on those two columns for looking up orders by customer.
Surely, the whole point of a checksum is that you create an index just on the checksum, so a SELECT
will retrieve the records that match the checksum, and then compare the underlying value as a safety check? Doesn't putting the underlying value in the index waste a bunch of space for no real gain?
Best Answer
You are right, this is pointless.
Two (of many) reasons that I see it's wrong
it isn't guaranteed unique (CHECKSUM gives int) whereas the GUID is (over the range of GUID). It's a small chance of duplicate but quite possible: like the "birthday problem" somewhat
it's still random order. The main reason IDENTITY is better then GUID for a clustered index is that IDENTITY is monotonically increasing.
CHECKSUM(someGUID)
is random order tooI'd add a new IDENTITY column, and then start changing dependencies to use this only.