Foreign keys – link using surrogate or natural key

database-designforeign keynatural-keysurrogate-key

Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is Jack Douglas' answer in this question, and his reasoning seems sound to me. I'm aware of the discussion beyond that that rules change, but this would be something that would need to be considered in any situation.

The main reason for asking is that I have a legacy application that makes uses of FKs with natural keys, but there is a strong push from devlopers to move to an OR/M (NHibernate in our case), and a fork has already produced some breaking changes, so I'm looking to either push them back on track using the natural key, or move the legacy app to use surrogate keys for the FK. My gut says to restore the original FK, but I'm honestly not sure if this is really the right path to follow.

The majority of our tables already have both a surrogate and natural key already defined (though unique constraint and PK) so having to add extra columns is a non-issue for us in this insance. We're using SQL Server 2008, but I'd hope this is generic enough for any DB.

Best Answer

Neither SQL nor the relational model are disturbed by foreign keys that reference a natural key. In fact, referencing natural keys often dramatically improves performance. You'd be surprised how often the information you need is completely contained in a natural key; referencing that key trades a join for a wider table (and consequently reduces the number of rows you can store in one page).

By definition, the information you need is always completely contained in the natural key of every "lookup" table. (The term lookup table is informal. In the relational model, all tables are just tables. A table of US postal codes might have rows that look like this: {AK, Alaska}, {AL, Alabama}, {AZ, Arizona}, etc. Most people would call that a lookup table.)

On big systems, it's not unusual to find tables that have more than one candidate key. It's also not unusual for tables that serve one part of the enterprise to reference one candidate key, and tables that serve another part of the enterprise to reference a different candidate key. This is one of the strengths of the relational model, and it's a part of the relational model that SQL supports pretty well.

You'll run into two problems when you reference natural keys in tables that also have a surrogate key.

First, you'll surprise people. Although I usually lobby strongly for the Principle of Least Surprise, this is one situation where I don't mind surprising people. When the problem is that developers are surprised by the logical use of foreign keys, the solution is education, not redesign.

Second, ORMs aren't generally designed around the relational model, and they sometimes embody assumptions that don't reflect best practice. (In fact, they often seem to be designed without ever having input from a database professional.) Requiring an ID number in every table is one of those assumptions. Another one is assuming that the ORM application "owns" the database. (So it's free to create, drop, and rename tables and columns.)

I have worked on a database system that served data to hundreds of application programs written in at least two dozen languages over a period of 30 years. That database belongs to the enterprise, not to an ORM.

A fork that introduces breaking changes should be a show-stopper.

I measured performance with both natural keys and surrogate keys at a company I used to work at. There's a tipping point at which surrogate keys begin to outperform natural keys. (Assuming no additional effort to keep natural key performance high, like partitioning, partial indexes, function-based indexes, extra tablespaces, using solid-state disks, etc.) By my estimates for that company, they'll reach that tipping point in about 2045. In the meantime, they get better performance with natural keys.

Other relevant answers: In Database Schema Confusing