Database Design – How to Convert a Relation into BCNF

database-designnormalizationrelational-theory

I understand that to convert to BCNF, we need to take into account all violations and decompose as necessary with each FD.

For example, if we had R(A,B,C,D) with FD's AB->C, B->D, C->A. We can compute the closures of each, {AB}+ = ABCD, {B}+ = BD, {C}+ = CA.

And after going through the algorithm, a valid decomposition into BCNF would be R1(B,D), R2(C,A), R3(B,C).

Where I'm confused is that while this decomposition seems correct following the algorithm, how can it be correct if the first functional dependency AB->C does not seem to be satisfied?

Best Answer

The decomposition that you have produced is in effect correct, in the sense that the decomposed schemas are in BCNF.

However, as you have already noted, it does not preserve the dependencies, in particular the dependency AB → C is lost.

So you have re-discovered an important point about the decomposition in BCNF: one can always decompose a relation in BCNF, but at the price of sometimes losing one or more dependencies.

What does this mean in practice? We can lose (possibly important) constraints. In this case, for instance, the constraint that for each couple of values AB there is always a single value of C cannot be enforced on the resulting schema.

Can we do something for this? Well, we could decompose instead in 3NF, because the synthesis algorithm used for 3NF is guaranteed to produce always lossless and functional dependency preserving decompositions. In this case, for instance, it will produce the decomposition R1(A B C) and R2(B D), that maintains all the dependencies.

But, wait a moment! we have now a decomposition that does not eliminates all the redundancies eliminated with the BCNF algorithm, since, because of the dependency C → A, we will have the same value of A each time we have a certain value for C.

So, we have now a dilemma: should we prefer the 3NF that preserves the dependencies at the expense of maintaining some redundancy, or should we prefer the BCNF that reduces the redundancies at the expense of losing some “meaning” of the data?

The opinion of many is that we should choose the 3NF, since data meaning is considered more important that data redundancy (and not only this, but because the 3NF algorithm is a polynomial algorithm, while the BCNF algorithm is an exponential one).