You are correct to separate "clustered index" from "primary key":
- A clustered index is the organisation of data on disk is better if
- narrow
- numeric
- increasing (strictly monotonic)
- The primary key identifies a row
Note: GUIDs make poor clustering keys
In this case, with the surrogate column, the table has 2 candidate keys:
- ProductHistoryID
- ProductNo + CreatedDateTime
Assumed convention states that the ProductHistoryID becomes the PK, but you can leave the PK on (ProductNo, CreatedDateTime): it will just be non-clustered. Which leads to indexes:
- clustered index should be on ProductHistoryID
- unique non-clustered index on (ProductNo, CreatedDateTime)
Example
CREATE TABLE Product (
ProductHistoryID int NOT NULL IDENTITY (1,1) NOT NULL,
ProductNo ...
CreatedDateTime ...
then you a choice of
CONSTRAINT PK_Product PRIMARY KEY CLUSTERED (ProductHistoryID)
CONSTRAINT UQ_Product UNIQUE NONCLUSTERED (ProductHistoryID)
or
CONSTRAINT PK_Product PRIMARY KEY NONCLUSTERED (ProductNo, CreatedDateTime)
CONSTRAINT PK_Product UNIQUE CLUSTERED (ProductHistoryID)
Also, the pattern you have is a "type 2 Slowly Changing Dimension"
In many cases, there are more than one way to join two tables; See the other answers for lots of examples. Of course, one could say that it would be an error to use the 'automatic join' in those cases. Then only a handfull of simple cases where it can be used would be left.
However, there is a severe drawback! Queries that are correct today, might become an error tomorrow just by adding a second FK to the same table!
Let me say that again: by adding columns, queries that do not use those columns could turn from 'correct' into 'error'!
That is such a maintenance nightmare, that any sane style guide would prohibit to use this feature. Most already prohibit select *
for the same reason!
All this would be acceptable, if performance would be enhanced. However, that's not the case.
Summarizing, this feature could be used in only a limited set of simple cases, does not increase performance, and most style guides would prohibit its usage anyway.
Therefor it is not supprising that most database vendors choose to spend their time on more important things.
Best Answer
You need to define the criteria for what constitutes a "possible primary key". When you have done that, you/we can search for (for instance sp_special_columns) or produce a script that uses that criteria. Such criteria could be:
For 3 and 4, you are gambling, since having no duplicates today is not guarantee that there should be duplicates tomorrow.
In general, there is a risk messing about in a database unless you own the database. Say that you slap on an identity column on each table, make it the PK, just so you can use replication. Now, there might be some code that does SELECT * and breaks down because it now gets this additional column that you added.