Normalization: Should common columns of two tables be pulled out into a third table

best practicesdatabase-designnormalization

Should I further normalize my current design to eliminate redundancies?

I have the following tables:

empl: standard user table

empl_cat: employee "category"

empl_admin_cat: administrative "category"

Both of the category tables have a many-to-one relationship back to the empl table.

The employee category represents the categorization of the employee within the company.

The administrative category represents that employee's administrative oversight (if applicable) over other employees.

The tables are defined like this (irrelevant columns have been omitted):

empl (
    empl_id int PK,
    ...
)

empl_cat (
    empl_cat_id int PK,
    empl_id int FK,
    eff_date date,
    base_id int FK,
    eqp_id int FK,
    pos_id int FK
)

empl_admin_cat (
    admin_cat_id int PK,
    empl_id int FK,
    co_id int FK,
    base_id int FK,
    eqp_id int FK,
    pos_id int FK
)

I considered using a single table with a type column, but decided against it because they logically represent two different things, and while 3 of their columns are the same (base_id, eqp_id, pos_id), they each have one additional column (eff_date and co_id). Was this the right decision?

Would it be worthwhile creating another table that has an autoincrement Primary Key which holds all possible variations of base_id/eqp_id/pos_id, with the empl_cat and empl_admin_cat tables referencing the appropriate record in this new table instead of having their own base_id/eqp_id/pos_id columns?

Edit: The base_id, eqp_id and pos_id fields represent their "base of operations", "operating equipment", and "position". For example, SFO/777/Captain. The co_id in the empl_admin_cat represents a company. eff_date in empl_cat is an effective date.

Best Answer

I see 3 options

  1. Do nothing
  2. Superkey/subtype, that is put the common fields into empl
  3. Change to a self referencing table to build a hierarchy

Now,

  • Option 2 is similar to what you've described (mostly)
  • Option 3 is where you'd use a recursive CTE

However, I can't help thinking that Option 3 or some variety would be better. Perhaps some more info about what the columns mean? (Personally I don't abbreviate column names because the names are opaque)