Should every table have a single-field surrogate/artificial primary key

database-designprimary-keysurrogate-key

I understand one benefit of surrogate/artificial keys in general – they do not change and that can be very convenient. This is true whether they are single or multiple field – as long as they are 'artificial'.

However, it sometimes seems to be a matter of policy to have an auto-incrementing integer field as the primary key of each table. Is this always the best idea to have such a single-field key and why (or why not)?

To be clear, this question is not about artificial vs natural – but about whether all artificial keys should be single-field

Best Answer

I'm going to say no, not always, but most of the time yes..

These are some circumstances in which you don't need a surrogate or artificial key:

  • Pure intersection tables. If there is no risk of the intersection being the target of a foreign key and if there is little or no risk of the intersection attracting independent attributes (i.e. something other than FK's to the two parent tables) then you can get away with using the combination of FKs as the PK with fair confidence.
  • Lookup tables with static business keys. If you have a lookup
    table with a unique business key which is fixed externally to your
    business and which has zero chance of ever changing for any
    practical purpose, then using the business key directly can make
    things simpler. An example might be a list of state or province
    codes or a list of ANSI standard numbers, etc.
  • Tables containing data consolidated from multiple, independent sources. If your system has many sources of data that must be shoehorned together into a single table, say at head office, then sometimes you need a compound key that includes the source system key value and a code indicating what the source system was.

There are also some situations where the old-faithful monotonically increasing integer surrogate key is not ideal. You can have keys that are alphanumeric surrogates. These could include:

  • Situations where you need to merge data from multiple, independent sources. To avoid key collisions you might use GUIDs instead of IDENTITY keys.
  • Situations where you are forced to use non-numeric key representations. Let's say you've got a license plate database. Your key could be the alphanumeric value instead of a pure number.
  • Situations where some external requirement forces you to apply compression to your key value. Instead of using 10 digits for an int32 you can use six base 36 digits.

Why most of the time yes? The most fundamental answer to that question is that it is pure hell if you ever need to modify a primary key value on any table. Since almost anything a user can see or touch is conceivably subject to an update at some point, using a visible key value is inviting pure hell. Using a surrogate key will keep you from falling into this trap.

Having said that, remember that there is room for YAGNI in applying this concept. You don't need to go forcing code tables with IDENTITY keys into every nook and cranny of your schema, just in case someone decides that the symbol for male gender in your employee table needs to change from M to X or something silly.