Primary keys: composite and singular

database-design

In general when I design tables I always include a single unique field, even if it there is already a unique combination of columns available to create a primary key. This field is usually either a UUID (if application-generated) or a sequence (if database-generated).

Some questions:

  • are there any downsides to this approach (apart from storage)?
  • the sole exception to this rule is (for me) the join table that exists only to join other tables. Is there an argument to be made that a join table should also have a single field acting as primary key?

Best Answer

The benefits of using a surrogate key (that's your newly created single field), in my opinion, outweigh the downsides in most cases.

But there are a few cases that you may experience some drawbacks.

Think of a company's employees that work different shifts on different days of the week. Here's a simplified database:

SHIFT(ShiftID, Shift, SomeOtherColumn);
EMPLOYEE(EmployeeID, FirstName, LastName);
[TRANSACTION](TransactionID, ShiftID, EmployeeID, Date);

So, as constructed, we've created a surrogate key for each table. If we need to, say, find out what shift correspond to a particular shiftID from a particular transactionID, we'd have to run a second query. That's the downside of using a surrogate key in that case.

If the fields Shift and SomeOtherColumn from the SHIFT table can be assumed to be a candidate key (a composite primary key, in other words), then we would have to declare both fields in the our FOREIGN KEYconstraint DDL when we created the [TRANSACTION] table. So, we would be able to directly query the actual Shift from the [TRANSACTION] table without having to run an additional query.

Another issue with using a catch-all surrogate key is that it has no intrinsic value. So, if you work for a larger organization, but there are different databases (in other departments, for examples) that refer to the same items with different primary keys (surrogate keys), you may have a bit of an issue if you have join their data regularly.

Ditto for the developer who has 2 or 3 apps, maintains a different user database for each, with a different surrogate ID for each user but may have a ton of duplicate users between them.

Personally, I go on a case by case basis, but lean toward using a surrogate key more unless there is an outstanding reason I should not.