Primary Key Value Change – Why Would a Primary Key Value Change?

database-designprimary-key

I have been researching the concept of ROWGUID's recently and came across this question. This answer gave insight, but has led me down a different rabbit hole with the mention of changing the primary key value.

My understanding has always been that a primary key should be immutable, and my searching since reading this answer has only provided answers which reflect the same as a best practice.

Under what circumstances would a primary key value need to be altered after the record is created?

Best Answer

If you were using a person's name as a primary key and their name changed you would need to change the primary key. This is what ON UPDATE CASCADE is used for since it essentially cascades the change down to all related tables that have foreign-key relationships to the primary key.

For example:

USE tempdb;
GO

CREATE TABLE dbo.People
(
    PersonKey VARCHAR(200) NOT NULL
        CONSTRAINT PK_People
        PRIMARY KEY CLUSTERED
    , BirthDate DATE NULL
) ON [PRIMARY];

CREATE TABLE dbo.PeopleAKA
(
    PersonAKAKey VARCHAR(200) NOT NULL
        CONSTRAINT PK_PeopleAKA
        PRIMARY KEY CLUSTERED
    , PersonKey VARCHAR(200) NOT NULL
        CONSTRAINT FK_PeopleAKA_People
        FOREIGN KEY REFERENCES dbo.People(PersonKey)
        ON UPDATE CASCADE
) ON [PRIMARY];

INSERT INTO dbo.People(PersonKey, BirthDate)
VALUES ('Joe Black', '1776-01-01');

INSERT INTO dbo.PeopleAKA(PersonAKAKey, PersonKey)
VALUES ('Death', 'Joe Black');

A SELECT against both tables:

SELECT *
FROM dbo.People p
    INNER JOIN dbo.PeopleAKA pa ON p.PersonKey = pa.PersonKey;

Returns:

If we update the PersonKey column, and re-run the SELECT:

UPDATE dbo.People
SET PersonKey = 'Mr Joe Black'
WHERE PersonKey = 'Joe Black';

SELECT *
FROM dbo.People p
    INNER JOIN dbo.PeopleAKA pa ON p.PersonKey = pa.PersonKey;

we see:

Looking at the plan for the above UPDATE statement, we clearly see both tables are updated by a single update statement by virtue of the foreign key defined as ON UPDATE CASCADE:

^{click the image above to see it in more clarity}

Finally, we'll cleanup our temporary tables:

DROP TABLE dbo.PeopleAKA;
DROP TABLE dbo.People;

The preferred¹ way to do this using surrogate keys would be:

USE tempdb;
GO

CREATE TABLE dbo.People
(
    PersonID INT NOT NULL IDENTITY(1,1)
        CONSTRAINT PK_People
        PRIMARY KEY CLUSTERED
    , PersonName VARCHAR(200) NOT NULL
    , BirthDate DATE NULL
) ON [PRIMARY];

CREATE TABLE dbo.PeopleAKA
(
    PersonAKAID INT NOT NULL IDENTITY(1,1)
        CONSTRAINT PK_PeopleAKA
        PRIMARY KEY CLUSTERED
    , PersonAKAName VARCHAR(200) NOT NULL
    , PersonID INT NOT NULL
        CONSTRAINT FK_PeopleAKA_People
        FOREIGN KEY REFERENCES dbo.People(PersonID)
        ON UPDATE CASCADE
) ON [PRIMARY];

INSERT INTO dbo.People(PersonName, BirthDate)
VALUES ('Joe Black', '1776-01-01');

INSERT INTO dbo.PeopleAKA(PersonID, PersonAKAName)
VALUES (1, 'Death');

SELECT *
FROM dbo.People p
    INNER JOIN dbo.PeopleAKA pa ON p.PersonID = pa.PersonID;

UPDATE dbo.People
SET PersonName = 'Mr Joe Black'
WHERE PersonID = 1;

For completeness, the plan for the update statement is very simple, and shows one advantage to surrogate keys, namely only a single row needs to be updated as opposed to every row containing the key in a natural-key scenario:

SELECT *
FROM dbo.People p
    INNER JOIN dbo.PeopleAKA pa ON p.PersonID = pa.PersonID;

DROP TABLE dbo.PeopleAKA;
DROP TABLE dbo.People;

The output from the two SELECT statements above are:

Essentially, the outcome is approximately the same. One major difference is the wide natural key is not repeated in every table where the foreign key occurs. In my example, I'm using a VARCHAR(200) column to hold the person's name, which necessitates using a VARCHAR(200) everywhere. If there are a lot of rows and a lot of tables containing the foreign key, that will add up to a lot of wasted memory. Note, I'm not talking about disk space being wasted since most people say disk space is so cheap as to be essentially free. Memory, however, is expensive and deserves to be cherished. Using a 4-byte integer for the key will save a large amount of memory when you consider the average name length of around 15 characters.

Tangential to the question about how and why keys can change is the question about why to choose natural keys over surrogate keys, which is an interesting and perhaps more important question, especially where performance is a design-goal. See my question here about that.

^{1 - http://weblogs.sqlteam.com/mladenp/archive/2009/10/06/Why-I-prefer-surrogate-keys-instead-of-natural-keys-in.aspx}

Related Solutions

Primary Key – Performance of Multi Field Primary Key vs ‘Contrived’ Key

I'll answer obliquely...

The natural key is always the natural key and should be enforced with a unique constraint or index. This is the "primary key" that flows from your modelling phase.

The choice of an auto-number/identity surrogate key matters at implementation phase because there are good and bad choices for your clustered index (example: SQL Server, Sybase, MySQL InnoDB, Oracle IOT).

That is, primary key is orthogonal to your clustered index: don't confuse the two issues

I'd suggest using a contrived key adds no value over using an auto-number/identity column in this respect. You lose data from the natural key, probably won't be unique, is just as opaque.

FWIW, I use surrogate keys and composite keys when I need too:

Some natural keys are useful in their own right: ISO currency and country codes
A table with no secondary (non-clustered) indexes and no child table doesn't benefit from a surrogate key
If you have parent-child-grandchild, then I usually need to join parent-grandchild: with composite keys I can do so directly. Simpler JOINs, simpler indexes

Note: this assumes that every table requires a clustered index

Using indexes to create multiple relationships to a table in order to enforce data integrity and add meaning

Situation 1:

Your tables have one relationship and not two. (example: a Device belongs to a DeviceType)

So, keep only one relationship, the one with the composite keys (that include the Primary Key). The other relationship is redundant when the composite one is defined.

I would also suggest you have same names for related columns:

DeviceCategory Table
CategoryCode | Name        | Description
----------------------------------------
WKS          | Workstation | Description of what classifies an item as a workstation...
LPT          | Laptop      | Description of what classifies an item as a laptop...

DeviceType Table
DeviceTypeID | CategoryCode | Manufacturer | Model  | IsTrackedInOtherSystemDefault
-----------------------------------------------------------------------------------
1            | WKS          | Dell         | GX1000 | true
2            | LPT          | HP           | dv4000 | false
3            | WKS          | HP           | xx9000 | false

Device Table
DeviceID | SerialNumber | DeviceTypeID | IsTrackedInOtherSystem | CategoryCode 
------------------------------------------------------------------------------
1        | I81U812      | 1            | true                   | WKS
2        | N0S4A2       | 1            | false                  | WKS
3        | 3BL1NDMIC3   | 2            | false                  | LPT

So, the design would be:

DeviceCategory
--------------
CategoryCode   PK
Name           U1
Description

DeviceType 
----------
DeviceTypeID   PK   U1
CategoryCode   FK   U1
Manufacturer   U2
Model          U2
IsTrackedInOtherSystemDefault

Device 
------
DeviceID       PK   U1
SerialNumber   U2
DeviceTypeID   FK1
IsTrackedInOtherSystem
CategoryCode   FK1  U1

and for the Computer:

Computer
--------
DeviceID       PK   FK1
Hostname       U1
IPAddress      U2
CategoryCode   FK1  CHK

The "additional" UNIQUE keys (the two composite U1 ones) will be needed in most DBMS to enforce the foreign key constraints. I guess this answers your question 2, relationships needs indices to be enforced, so (you have to) use them. They will be used by the DBMS not only to enforce integrity but in your queries/statements, when you will be joining the tables.

The only one that is not needed is the U3 you had in the Computer table.

About question 3 (the over-engineering part): No, I don't think so but that's just my opinion. And you haven't told us if this is a homework/exercise or a real project, whether you will be holding only your family's or a multi-million company's inventory, etc.

Situation 2

I think what you have is fine and there is no need (and not a good idea) to have referential integrity constraints on these columns. This is a default value that is copied in the second table via a stored procedure (I guess during Inserts on the second table?) or altered by a user. If you add an FK, won't that deny users the ability to override the default?

The names of the two columns are self-explanatory enough for a DBA to understand the functionality.

Best Answer

Related Solutions

Primary Key – Performance of Multi Field Primary Key vs ‘Contrived’ Key

Using indexes to create multiple relationships to a table in order to enforce data integrity and add meaning

Related Question