Sql-server – Does this computed CHECKSUM() index design make sense

indexsql server

I have come across what looks to me like a slightly odd pattern in a SQL Server 2005 database I'm taking care of, and was wondering whether it's just me, or whether it really is odd.

There are a number of tables with uniqueidentifier primary keys, which also have a computed column which is the CHECKSUM of that key, e.g.

[CustomerGuid] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
[CustomerHash]  AS (CHECKSUM([CustomerGuid])) PERSISTED,

Then, there are indexes which contain both of these fields, e.g.

CREATE NONCLUSTERED INDEX [IX_Customer_CustomerHashAndGuid] ON [dbo].[Customer] 
(
    [CustomerHash] ASC,
    [CustomerGuid] ASC
)

This pattern also pops up with Guids that are not primary keys – e.g., an Order table with CustomerGuid and CustomerHash for each order, and an index on those two columns for looking up orders by customer.

Surely, the whole point of a checksum is that you create an index just on the checksum, so a SELECT will retrieve the records that match the checksum, and then compare the underlying value as a safety check? Doesn't putting the underlying value in the index waste a bunch of space for no real gain?

Best Answer

You are right, this is pointless.

Two (of many) reasons that I see it's wrong

it isn't guaranteed unique (CHECKSUM gives int) whereas the GUID is (over the range of GUID). It's a small chance of duplicate but quite possible: like the "birthday problem" somewhat
it's still random order. The main reason IDENTITY is better then GUID for a clustered index is that IDENTITY is monotonically increasing. CHECKSUM(someGUID) is random order too

I'd add a new IDENTITY column, and then start changing dependencies to use this only.

Related Solutions

Sql-server – SQL Server : primary keys advice to the whitepaper needed

Your points are unrelated to database design: choice of natural or surrogate key is an implementation decisions after conceptual and logical models are complete

In addition to comments and other answers:

some natural keys work well such as currency or language codes (CHF, GBP, DE, EN etc)
avoiding composite keys forces you to always join intermediate tables (rather than simple) parent-grandchild
adding a surrogate key in unnecessary for link tables

Edit: example of "composite keys"

Assume: t1 has child t2 has child t3

If you had the key of t1 in t3 (composite key) you can join t1 and t3 directly.
t1 key is also the left hand column of t3 key so you don't need an extra index
With a surrogate key/FK, you have to join via t2
You need extra indexes on the FK columns in t2 and t3 which

This latter option with the "always use surrogate key" dogma

adds complexity
decreased or reverses disk space "savings"

Sql-server – Does this query make sense

As far as I understand the semantics of the Sybase non standard GROUP BY a purely mechanical rewrite would be.

WITH T
     AS (SELECT person_id,
                start_date,
                MAX(start_date) OVER (PARTITION BY person_id) AS max_start_date
         FROM   leaveperiods
         WHERE  group_id = 146)
SELECT person_id
FROM   T
WHERE  start_date = max_start_date

But the query does seem odd.

Documentation Extract

For example, many versions of SQL do not allow the inclusion of the extended title_id column in the select list, but it is legal in Transact-SQL:
SELECT type,
       title_id,
       avg(price),
       avg(advance)
FROM   titlesgroup
GROUP  BY type 
The above example still aggregates the price and advance columns based on the type column, but its results also display the title_id for the books included in each group.

+--------------+----------+------------+--------------+
|     type     | title_id | avg(price) | avg(advance) |
+--------------+----------+------------+--------------+
| mod_cook     | MC3021   | 11.49      | 7,500.00     |
| UNDECIDED    | MC3026   | NULL       | NULL         |
| popular_comp | PC1035   | 21.48      | 7,500.00     |
| popular_comp | PC8888   | 21.48      | 7,500.00     |
| popular_comp | PC9999   | 21.48      | 7,500.00     |
| psychology   | PS1372   | 13.50      | 4,255.00     |
| psychology   | PS2091   | 13.50      | 4,255.00     |
| psychology   | PS2106   | 13.50      | 4,255.00     |
| psychology   | PS3333   | 13.50      | 4,255.00     |
| psychology   | PS7777   | 13.50      | 4,255.00     |
| trad_cook    | TC3218   | 15.96      | 6,333.33     |
| trad_cook    | TC4203   | 15.96      | 6,333.33     |
| trad_cook    | TC7777   | 15.96      | 6,333.33     |
+--------------+----------+------------+--------------+

Best Answer

Related Solutions

Sql-server – SQL Server : primary keys advice to the whitepaper needed

Sql-server – Does this query make sense

Documentation Extract

Related Question