Sql-server – Does this computed CHECKSUM() index design make sense

indexsql server

I have come across what looks to me like a slightly odd pattern in a SQL Server 2005 database I'm taking care of, and was wondering whether it's just me, or whether it really is odd.

There are a number of tables with uniqueidentifier primary keys, which also have a computed column which is the CHECKSUM of that key, e.g.

[CustomerGuid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
[CustomerHash]  AS (CHECKSUM([CustomerGuid])) PERSISTED,

Then, there are indexes which contain both of these fields, e.g.

CREATE NONCLUSTERED INDEX [IX_Customer_CustomerHashAndGuid] ON [dbo].[Customer] 
(
    [CustomerHash] ASC,
    [CustomerGuid] ASC
)

This pattern also pops up with Guids that are not primary keys – e.g., an Order table with CustomerGuid and CustomerHash for each order, and an index on those two columns for looking up orders by customer.

Surely, the whole point of a checksum is that you create an index just on the checksum, so a SELECT will retrieve the records that match the checksum, and then compare the underlying value as a safety check? Doesn't putting the underlying value in the index waste a bunch of space for no real gain?

Best Answer

You are right, this is pointless.

Two (of many) reasons that I see it's wrong

  • it isn't guaranteed unique (CHECKSUM gives int) whereas the GUID is (over the range of GUID). It's a small chance of duplicate but quite possible: like the "birthday problem" somewhat

  • it's still random order. The main reason IDENTITY is better then GUID for a clustered index is that IDENTITY is monotonically increasing. CHECKSUM(someGUID) is random order too

I'd add a new IDENTITY column, and then start changing dependencies to use this only.