Lookup tables (or code tables, as some people call them) are usually a collection of the possible values that can be given for a certain column.
For example, suppose we have a lookup table called party
(meant to store information about political parties) that has two columns:
party_code_idn
, which holds system-generated numeric values, and (lacking business domain meaning) works as a surrogate for the real key.party_code
, is the real or “natural” key of the table because it maintains values that have business domain connotations.
And let us say that such table retains the data that follows:
+----------------+------------+
| party_code_idn | party_code |
+----------------+------------+
| 1 | Republican |
| 2 | Democratic |
+----------------+------------+
The party_code
column, which keeps the values 'Republican' and 'Democratic', being the real key of the table, is set up with a UNIQUE constraint, but I optionally added the party_code_idn
and defined it as the PK of the table (though, logically speaking, party_code
may work as the PRIMARY KEY [PK]).
Question
What are the best practices for pointing to lookup values from transaction tables? Should I establish FOREIGN KEY (FK) references either (a) directly to the natural and meaningful value or (b) to surrogate values?
Option (a), for example,
+---------------+------------+---------+
| candidate_idn | party_code | city |
+---------------+------------+---------+
| 1 | Democratic | Alaska |
| 2 | Republican | Memphis |
+---------------+------------+---------+
has the following properties1:
- Readable for the end user (+)
- Easy to import-export across systems (+)
- Difficult to change the value as it needs modification in all referring tables (-)
- Adding new value is not costly (=)
I think it is almost like “pass by value”, to draw an analogy from function call in application programming jargon.
Option (b), for instance,
+---------------+----------------+---------+
| candidate_idn | party_code_idn | city |
+---------------+----------------+---------+
| 1 | 1 | Alaska |
| 2 | 2 | Memphis |
+---------------+----------------+---------+
has the properties below:
- Not readable for the end user (-)
- Difficult to import-export as we need to de-reference it (-)
- Easy to change values, as we are only storing references in transaction tables (+)
- Adding new value is not costly (=)
It is very similar to “pass by reference”, if comparing to function call in app programming parlance.
Import-Export can also be done in a different way, i.e., just by populating the look-up table again and then re-seed the surrogate column. I hope I am getting this right, this is something I have just heard as a possibility.
1. Note that +
, -
and =
indicate the benefit of those properties.
Question
Quite importantly: Is there a difference between a lookup (or code) table and a FK reference if we are just going to use the latter approach? I think they work just the same.
Best Answer
By
IDN
, I take it you mean anIDENTITY
,SEQUENCE
orAUTO_INCREMENT
field? You should take a look here and here.Note, section 5 (Misusing Data values as Data Elements) of the first reference, underneath figure 10
So, this expert thinks that you should "deference" surrogate keys. It is really quite a basic SQL technique and shouldn't cause problems in your day-to-day SQL. It appears that there is an error in figure 10 - the sales_person in SalesData should be a surrogate key (i.e. a number), not text. I'm inferring this from the quote above.
What you should avoid at all costs is the temptation (very common for novice database programmers) to commit the error outlined in section (1) Common Lookup Tables. This is commonly called the MUCK (Massively Unified Code Key) approach (not by accident :-) notably by Joe Celko, also sarcasticlly known as the OTLT - One True Lookup Table) and leads to all sorts of difficulties. Novice programmers appear to feel that a single code/lookup/whatever table is "cleaner" and will be more efficient when nothing could be further from the truth.
From the second reference above:
You might also want to take a look at the related EAV (Entity Attribute Value) paradigm which I deal with here.