Database Design – How to Convert a Relation into BCNF

database-designnormalizationrelational-theory

I understand that to convert to BCNF, we need to take into account all violations and decompose as necessary with each FD.

For example, if we had R(A,B,C,D) with FD's AB->C, B->D, C->A. We can compute the closures of each, {AB}+ = ABCD, {B}+ = BD, {C}+ = CA.

And after going through the algorithm, a valid decomposition into BCNF would be R1(B,D), R2(C,A), R3(B,C).

Where I'm confused is that while this decomposition seems correct following the algorithm, how can it be correct if the first functional dependency AB->C does not seem to be satisfied?

Best Answer

The decomposition that you have produced is in effect correct, in the sense that the decomposed schemas are in BCNF.

However, as you have already noted, it does not preserve the dependencies, in particular the dependency AB → C is lost.

So you have re-discovered an important point about the decomposition in BCNF: one can always decompose a relation in BCNF, but at the price of sometimes losing one or more dependencies.

What does this mean in practice? We can lose (possibly important) constraints. In this case, for instance, the constraint that for each couple of values AB there is always a single value of C cannot be enforced on the resulting schema.

Can we do something for this? Well, we could decompose instead in 3NF, because the synthesis algorithm used for 3NF is guaranteed to produce always lossless and functional dependency preserving decompositions. In this case, for instance, it will produce the decomposition R1(A B C) and R2(B D), that maintains all the dependencies.

But, wait a moment! we have now a decomposition that does not eliminates all the redundancies eliminated with the BCNF algorithm, since, because of the dependency C → A, we will have the same value of A each time we have a certain value for C.

So, we have now a dilemma: should we prefer the 3NF that preserves the dependencies at the expense of maintaining some redundancy, or should we prefer the BCNF that reduces the redundancies at the expense of losing some “meaning” of the data?

The opinion of many is that we should choose the 3NF, since data meaning is considered more important that data redundancy (and not only this, but because the 3NF algorithm is a polynomial algorithm, while the BCNF algorithm is an exponential one).

Related Solutions

Database functional dependency BCNF help

Look up "dependency preservation".

It is a known phenomenon that when decomposing relation schemas (along some given FD), certain other FD's can become inexpressible.

In a complete logical design, they have to be reinstated as a constraint on the database. However, normalization theory does not cover constraints. Normalization theory deals only with relation schemas, not with constraints between them.

If you first decompose over CD->E (leaving ABCD / CDE), you can still further decompose over A->D (leaving AD / ABC / CDE), but you will now have "lost" the BE->A dependency.

Normalization – How to Decompose Relation into 3NF Relations

First of all, note that the original relation is already in Third Normal Form, since each attribute is prime (each attribute is a key, actually), so that the definition of 3NF is respected.

Then, note that the algorithm is incomplete. The steps are:

Find a minimal basis of F, say G

For each groups of FD with the same left part, X → A₁, X → A₂, ..., X → A_n in G, use {X, A₁, A₂, ..., A_n} as the schema of one of the relations in the decomposition

Delete all the relations whose attributes are contained in another relation.

If none of the sets of relations from Step2 is a superkey for R, add another relation whose schema is a key for R.

So in the first case, you obtain three groups of dependencies:

A → B
B → A
B → C
C → B

that produce three relations, R₁(A, B), R₂(A, B, C), R₃(B, C), and, following the algorithm, you obtain as result only R₂, since the other two have attributes contained in them.

So you have two different outputs from the algorithm, depending on the minimal base used (which in turn depends on the order in which you consider the dependencies when calculating the minimal cover).

So, the answer to your question:

which one is correct?

is: both of them are correct, since both of them satisfies the definition of the 3NF. You have simply discovered that the synthesis algorithm for decomposing a relation in 3NF can produce different solutions.

A different question is: which is “better”, and of course the solution with a single relation is “better”, since you do not need to join tables when making queries.

Of course, if one could check at the beginning if the relation is already in 3NF, than he can avoid to apply the algorithm. But this in general cannot be done, since the check requires the exponential computation of all the keys, to find the prime attributes of the relation.

Best Answer

Related Solutions

Database functional dependency BCNF help

Normalization – How to Decompose Relation into 3NF Relations

Related Question