Sql-server – Difference between a unique clustered index and a clustered primary key

I'd like to ask two supplementary/follow-on questions, further to this previous/existing question: Is there any tangible difference between a unique clustered index and a clustered primary key?

That question starts with, "I understand that there may be a difference in meaning or intent between the two". I'm a programmer not a DBA, and this fundamental might be unclear to me: what is the difference in meaning or intent between the two?
My summary of the accepted answer, i.e. its most important statement IMO, is that it says, "I don't think there's any difference". If that's so then why did Microsoft implement "clustered indexes"? Why not just say instead, "It's always clustered on the primary key, and you should define as the primary key whatever you want it to be clustered on"?

It seems to me that a primary key (already) is a unique clustered index.

Furthermore, here's a specific problem by way of example.

Let's say I have a table of Users (with a userId as its primary key), and a second table (e.g. Items) which defines items owned by each user. A user can own many items; each item is owned by one user, and has an itemId.

So the itemId could be the primary key of the Items table; and each row in the Items table (which has an itemId) also has a userId to identify its owner.

That's a good way to define a 1-N relationship, isn't it? Assume a foreign key contraint on userId, with Users being the parent table.

At run-time I usually want to retrieve all the items owned by a user, therefore the Items table should be clustered on its userId column.

[Users]
  userId
  + plus other user-specific fields

[Items]
  userId
  itemId
  + plus other item-specific fields

I think there are two ways to define this Items table:

itemId is primary non-clustered key, and (userId,itemId) is unique clustered index

… or:

(userId,itemId) is primary clustered key, and itemId is unique non-clustered index.

Which of the above two is better or more correct, semantically and/or practically, and why?

In case it makes a difference, the itemID is an artificial key: its purpose is to disambiguate/identify the item (and/or identify the item, within the set of items owned by the user).

'Physically' the itemID is probably globally unique (or unique within the table, anyway): because databases make it easy to create a globally-unique artificial key.
'Logically' I wouldn't mind if it were not-globally-unique, but were instead only unique-within-each-user, such that I needed both userId and itemId to uniquely identify an item, i.e.

(userId,itemId) is primary clustered key and itemId is unique non-clustered index.

So I think it isn't altogether wrong to see (userId,itemId) as a composite primary key?

Apparently it's fine and normal to use "two separate attributes" as the primary key of an associative table when it's an N-to-N relationship. Is it wrong (e.g. harmful in some way, for some reason) to use two attributes as the primary key of an object in a 1-to-N relationship? Is it wrong to say that the owner ID is part of the object's identity?

Best Answer

why did Microsoft implement "clustered indexes"? Why not just say instead, "It's always clustered on the primary key, and you should define as the primary key whatever you want it to be clustered on"?

To answer this question: The primary key on the table does not have to be the Clustered Index Key. A Clustered Index is the place where SQL Server stores the data of the entire table. The Index itself has what are called "key values." Key Values (1 to many) define how the index is ordered, along with statistics and other useful things.

In fact, often it's useful to have a Primary Key that is not the Clustered Index Key. For example, some applications create their own keys or GUIDs in order to relate different tables through foreign keys. Using a GUID as a Clustered Index Key is a bad idea though, since GUIDs don't always have an order, unless you create them sequentially. To solve this, you could create a Clustered Index on an identity column, and place the Primary Key on the GUID column.

Your question has many more questions. I think we need to break this down into many questions, since you're asking about a lot of index concepts.

Best Answer

Related Solutions

Sql-server – ny tangible difference between a unique clustered index and a clustered primary key

Sql-server – Primary key vs. unique index: performance difference with foreign keys

Related Question