Normalization – How to Decompose Relation into 3NF Relations

normalizationrelational-theory

Today I read about the 3NF decomposition algorithm. It said:

Find a minimal basis of F, say G

For each FD X → A in G, use {X, A} as the schema of one of the
relations in the decomposition

If none of the sets of relations from Step2 is a superkey for R, add
another relation whose schema is a key for R

I want to decompose this relation into 3NF.

R(A,B,C)
S={A→B, A→C, B→A, B→C, C→A, C→ B, AB→ C, BC→A, AC→B, A→BC, B→AC, C→AB}

As we can see, the key of R is: {A},{B},{C}

S has several minimal basis, such as:

{A→B, B→A, B→C, C→B}; and
{A→B, B→C, C→A}

The problem is, if we use the 1st minimal basis, then we decompose R into 2 relations: (A, B), (B, C).

If we use the 2nd minimal basis, R turns into: (A, B), (B, C), (C, A).

My question is: which one is correct?

Best Answer

First of all, note that the original relation is already in Third Normal Form, since each attribute is prime (each attribute is a key, actually), so that the definition of 3NF is respected.

Then, note that the algorithm is incomplete. The steps are:

Find a minimal basis of F, say G

For each groups of FD with the same left part, X → A₁, X → A₂, ..., X → A_n in G, use {X, A₁, A₂, ..., A_n} as the schema of one of the relations in the decomposition

Delete all the relations whose attributes are contained in another relation.

If none of the sets of relations from Step2 is a superkey for R, add another relation whose schema is a key for R.

So in the first case, you obtain three groups of dependencies:

A → B
B → A
B → C
C → B

that produce three relations, R₁(A, B), R₂(A, B, C), R₃(B, C), and, following the algorithm, you obtain as result only R₂, since the other two have attributes contained in them.

So you have two different outputs from the algorithm, depending on the minimal base used (which in turn depends on the order in which you consider the dependencies when calculating the minimal cover).

So, the answer to your question:

which one is correct?

is: both of them are correct, since both of them satisfies the definition of the 3NF. You have simply discovered that the synthesis algorithm for decomposing a relation in 3NF can produce different solutions.

A different question is: which is “better”, and of course the solution with a single relation is “better”, since you do not need to join tables when making queries.

Of course, if one could check at the beginning if the relation is already in 3NF, than he can avoid to apply the algorithm. But this in general cannot be done, since the check requires the exponential computation of all the keys, to find the prime attributes of the relation.

Related Solutions

Why is this relation in 3NF

The mistake is in your understanding of transitive dependency. From wikipedia: Transitive dependency

In mathematics, a transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive dependency can occur only in a relation that has three or more attributes. Let A, B, and C designate three distinct attributes (or distinct collections of attributes) in the relation. Suppose all three of the following conditions hold:

1. A → B
2. It is not the case that B → A
3. B → C

Then the functional dependency A → C (which follows from 1 and 3 by the axiom of transitivity) is a transitive dependency.

In your case though, (2) does not hold:

1. U → T                                  -- correct
2. It is not the case that T → U          -- wrong
3. T → V                                  -- correct

Therefore {V} is NOT transitively dependent on {U}.

Database Design – Decomposition of a Relation to 2NF and 3NF

The goal of 3NF (and BCNF) is to avoid situations when a set of attributes is functionally determined by something that is not a key (or, more formally, a superkey), which may lead to anomalies and redundancy.

Keeping that in mind, we can describe certain functional dependencies as either "bad" (if they prevent the relation from being in 3NF) or "good" (if they do not).

So your solution is wrong, because "your" R1 has the E->D dependency and E is not R1's key. Determining a key for a relation is another topic which, I suppose, is beyond the scope of the task you need to accomplish (we are told that ABC is a key). But if you are interested, you can look the method up using "attribute closure" keywords.

Anyway, the E->D dependency is "bad". But that is not the only problem. Another one is that your decomposition is not a "lossless-join" one. In other words, if we have some real world data, split the relation according to your proposition and try to join R0 and R1 back, the data will be corrupted. Data corruption is not what normalization is designed for.

We have to solve 2 problems:

Decide which dependencies are "bad" (the definition of normal forms is the answer to this)
Decide how to split relations to prevent data corruption (we have Heath's theorem to help us)

Getting back to your problem, finally.

In R, neither B, nor E is the key. So both B->E and E->D are "bad", and we need to take measures. On the contrary ABC->EF is "good", because ABC is a key.

Currently R is in 1NF, because B is a part of a key and functionally determines E, which is not a part of a key. This kind of situation is unacceptable in 2NF.

E->D dependency is not that "bad", because E is not a part of a key.

Another thing to mention at this point is that a set of dependencies {B->E, E->D} implies the dependency B->ED (according to so called Armstrong axiom of transitivity).

So we decide to split using B->ED dependency as "the worst" one. That's the way we do this, according to Heath:

R = ABCDEF
R0 = BED (B->E, E->D, B is a key)
R1 = ABCF (ABC->F, ABC is a key)

Now we have R1, which is in 3NF and R0, which is in 2NF (it has a transitive dependency B->E->D, which is unacceptable in 3NF).

We need to split R0 and to do so we have to choose a "victim" dependency among B->E and E->D. As we see, B->E is now a "good" one (B is a key) and E->D is still "bad".

Our split:

R0 = BED
R2 = BE (B->E, B is a key)
R3 = ED (E->D, E is a key)

R1, R2 and R3 are the final decomposition.

Best Answer

Related Solutions

Why is this relation in 3NF

Database Design – Decomposition of a Relation to 2NF and 3NF

Related Question