Proper Use of Lookup Tables in Relational Databases

relational-theory

I'm having trouble figuring out exactly how to place good boundaries for when and where to use lookup tables in a database. Most sources I've looked at say that I can never have too many but, at some point, it seems like the database would be broken down into so many pieces that, while it may be efficient, it is no longer manageable. Here's a thrown together example of what I'm working with:

Let's say I have a table called Employees:

ID  LName   FName   Gender  Position
1   Doe     John    Male    Manager
2   Doe     Jane    Female  Sales
3   Smith   John    Male    Sales

Pretend for a moment that the data is more complex and contains hundreds of rows. The most obvious thing I see that could be moved to a lookup table would be Position. I could create a table called Positions and stick the foreign keys from the Positions table into the Employees table in the Position column.

ID  Position
1   Manager
2   Sales

But how far can I continue to break the information down into smaller lookup tables before it becomes unmanageable? I could create a Gender table and have a 1 correspond to Male and a 2 correspond to Female in a separate lookup table. I could even put LNames and FNames into tables. All "John" entries are replaced with a foreign key of 1 that points to the FName table that says an ID of 1 corresponds to John. If you go down this rabbit hole too far like this, though, your Employees table is then reduced to a mess of foreign keys:

ID  LName   FName   Gender  Position
1   1       1       1       1
2   1       2       2       2
3   2       1       1       2

While this might or might not be more efficient for a server to process, this is certainly unreadable to a normal person who may be trying to maintain it and makes it more difficult for an application developer trying to access it. So, my real question is how far is too far? Are there "best practices" for this sort of thing or a good set of guidelines somewhere? I can't find any information online that really nails down a good, useable set of guidelines for this particular issue I'm having. Database design is old hat to me but GOOD database design is very new so overly technical answers may be over my head. Any help would be appreciated!

Best Answer

But how far can I continue to break the information down into smaller lookup tables before it becomes unmanageable? I could create a Gender table and have a 1 correspond to Male and a 2 correspond to Female in a separate lookup table.

You're mixing two different issues. One issue is the use of a "lookup" table; the other is the use of surrogate keys (id numbers).

Start with this table.

ID  LName   FName   Gender  Position
1   Doe     John    Male    Manager
2   Doe     Jane    Female  Sales
3   Smith   John    Male    Sales

You can create a "lookup" table for positions like this.

create table positions (
  pos_name varchar(10) primary key
);

insert into positions
select distinct position 
from employees;

alter table employees
add constraint emp_fk1
foreign key (position) 
  references positions (pos_name);

Your original table looks exactly like it did before creating the "lookup" table. And the table of employees requires no additional joins to get useful, human-readable data out of it.

Using a "lookup" table boils down to this: Does your application need the control over input values that a foreign key reference provides? If so, then you can always use a "lookup" table. (Regardless of whether it uses a surrogate key.)

In some cases, you'll be able to completely populate that table at design time. In other cases, users need to be able to add rows to that table at run time. (And you'll probably need to include some administrative processes to review new data.) Gender, which actually has an ISO standard, can be completely populated at design time. Street names for international online product orders probably have to be added at run time.