3NF Decomposition – Example and Explanation

database-designnormalizationrelational-theory

I am trying to make sense of an example of 3NF decomposition using the 4-step algorithm mentioned by Ullman here, but I'm not understanding what my lecturer is doing with the last step (or, worse, I'm not understanding the algorithm itself).

I realize this is a bit of a newbie question, but I did all the googling but couldn't find anything illuminating and I've been sitting here scratching my head for a few hours now.

Now,

the algorithm in question is:

  1. Find a canonical cover F_c for F
  2. For each FD X->Y in F_c create a relation with schema XY
  3. Eliminate a relation if its schema is a subset of another
  4. If none of the schemas created so far contains a key or R add a relation schema containing a key of R.

Now, the example my lecturer gave us starts with

 R<A,B,C,D,E,F>
 CE->A, C->D, A->B, D->BE, B->F, AD->CF

Fc is shown to be

C->A, C->D, A->B, D->B, D->E, B->F, AD->C

and the keys are C and AD.

All's well until now.

Now we apply step 2 of the algorithm.

We have:


R_i:    A,B   |   D,B,E  |   F,B    |   C,D,A   |   A,D,C
     ---------+----------+----------+-----------+-----------
F_i    A->B   |   D->B,  |   B->F   |   C->D    |   AD->C
              |   D->E   |          |   C->A    |



Now this is were it gets ugly.
The lecturer remarks that the fifth one, R_5 is a subset
(a trivial subset, say I) of the fourth and proceeds to… "merge" them, leaving us with


R_i:    A,B   |   D,B,E  |   F,B    |   C,D,A   
     ---------+----------+----------+-----------
F_i    A->B   |   D->B,  |   B->F   |   C->D    
              |   D->E   |          |   C->A    
              |          |          |   AD->C   - - - - - notice this!

Notice how AD->C popped up in F_4.

The question is simply: why?

Why didn't he remove the fifth column altogether?

Is he applying step 3 of the algorithm?
But it only instructs to eliminate the offending schemas, it doesn't say anything about adding dependencies to other schemas.

Is he applying step 4 of the algorithm?
But it says that the schemas must contain a key of R, not all of them (which would then make some sense).

Is this simply a trivial mistake?
Is the final result correct anyway?

Thanks everybody.

Best Answer

The theory is all fine and good, but it only starts to really make sense once you understand the practice. The Lawyer's version of the first three Normal Forms is:

  1. The Key - there must be a Primary key for every relation being normalized.
  2. The Whole Key - There must not be any functional dependencies of attributes on any proper subset of the Primary Key.
  3. And Nothing but The Key - There must not be any functional dependencies of attributes on non-key attributes.

In practice relations frequently have multiple candidate keys, and these rules must apply to all of them in turn.

Also, when Normalizing one must determine the minimal key(s) for each relation, and use those rather than any Super Key.

Your lecturer didn't remove the fifth column altogether because the functional dependency in that column still exists, and must still be accounted for in the Normalization process.

Update:

The FD AD->C doesn't disappear by virtue of recognizing ADC as a subset of CDA; neither is it sensible to have two relations ADC and CDA both in the model as this is a redundancy of exactly the sort Normalization is designed to eliminate.