Database Lookup Tables – Best Practices for Relational Databases

best practicesdatabase-designforeign keyprimary-key

Lookup tables (or code tables, as some people call them) are usually a collection of the possible values that can be given for a certain column.

For example, suppose we have a lookup table called party (meant to store information about political parties) that has two columns:

  • party_code_idn, which holds system-generated numeric values, and (lacking business domain meaning) works as a surrogate for the real key.
  • party_code, is the real or “natural” key of the table because it maintains values that have business domain connotations.

And let us say that such table retains the data that follows:

 +----------------+------------+
 | party_code_idn | party_code |
 +----------------+------------+
 |              1 | Republican |
 |              2 | Democratic |
 +----------------+------------+

The party_code column, which keeps the values 'Republican' and 'Democratic', being the real key of the table, is set up with a UNIQUE constraint, but I optionally added the party_code_idn and defined it as the PK of the table (though, logically speaking, party_code may work as the PRIMARY KEY [PK]).

Question

What are the best practices for pointing to lookup values from transaction tables? Should I establish FOREIGN KEY (FK) references either (a) directly to the natural and meaningful value or (b) to surrogate values?

Option (a), for example,

 +---------------+------------+---------+
 | candidate_idn | party_code |  city   |
 +---------------+------------+---------+
 |             1 | Democratic | Alaska  |
 |             2 | Republican | Memphis |
 +---------------+------------+---------+

has the following properties1:

  1. Readable for the end user (+)
  2. Easy to import-export across systems (+)
  3. Difficult to change the value as it needs modification in all referring tables (-)
  4. Adding new value is not costly (=)

I think it is almost like “pass by value”, to draw an analogy from function call in application programming jargon.

Option (b), for instance,

 +---------------+----------------+---------+
 | candidate_idn | party_code_idn |  city   |
 +---------------+----------------+---------+
 |             1 |              1 | Alaska  |
 |             2 |              2 | Memphis |
 +---------------+----------------+---------+

has the properties below:

  1. Not readable for the end user (-)
  2. Difficult to import-export as we need to de-reference it (-)
  3. Easy to change values, as we are only storing references in transaction tables (+)
  4. Adding new value is not costly (=)

It is very similar to “pass by reference”, if comparing to function call in app programming parlance.

Import-Export can also be done in a different way, i.e., just by populating the look-up table again and then re-seed the surrogate column. I hope I am getting this right, this is something I have just heard as a possibility.

1. Note that +, - and = indicate the benefit of those properties.

Question

Quite importantly: Is there a difference between a lookup (or code) table and a FK reference if we are just going to use the latter approach? I think they work just the same.

Related resources

Best Answer

By IDN, I take it you mean an IDENTITY, SEQUENCE or AUTO_INCREMENT field? You should take a look here and here.

Note, section 5 (Misusing Data values as Data Elements) of the first reference, underneath figure 10

Of course you can have a separate table for the sales persons and then reference it using a foreign key, preferably with a simple surrogate key such as sales_person_id , shown above.

So, this expert thinks that you should "deference" surrogate keys. It is really quite a basic SQL technique and shouldn't cause problems in your day-to-day SQL. It appears that there is an error in figure 10 - the sales_person in SalesData should be a surrogate key (i.e. a number), not text. I'm inferring this from the quote above.

What you should avoid at all costs is the temptation (very common for novice database programmers) to commit the error outlined in section (1) Common Lookup Tables. This is commonly called the MUCK (Massively Unified Code Key) approach (not by accident :-) notably by Joe Celko, also sarcasticlly known as the OTLT - One True Lookup Table) and leads to all sorts of difficulties. Novice programmers appear to feel that a single code/lookup/whatever table is "cleaner" and will be more efficient when nothing could be further from the truth.

From the second reference above:

Normalization eliminates redundant data, thus making the task of enforcing data integrity vastly simpler, but the process of creating a MUCK is something else entirely.MUCK's do not eliminate redundant data, rather they are an elimination of what are PERCEIVED to be redundant tables, but as I will demonstrate, fewer tables does not equal simplicity.

You might also want to take a look at the related EAV (Entity Attribute Value) paradigm which I deal with here.