Are there relations that cannot be normalized to second normal form (2NF)

database-designnormalization

I came across this example that I can't normalize to 2NF, so I was wondering if relations that could not be normalized existed.

Here is the example relation:

  • R(A, B, C, D, E)

with the functional dependencies:

  • {A, B} → D
  • C → B
  • E → A

or, diagrammatically:

enter image description here

The primary key of that relation is {C, E}, and if we suppose that all the attributes are atomic, then R is in first normal form (1NF).

But how can we normalize it to 2NF?

Since A and B are not fully dependent on {C, E}, I feel like they should be set aside in two different relations, but then D isn't dependent on anything in R. What am I missing?

Best Answer

Note that the normalization of a relation is an operation that in general is performed to eliminate or reduce anomalies like redundancy, while in general preserving the meaning of the data, and more formally, preserving two essential properties: the dependencies holding on the original data, and the data itself (through the concept of lossless decomposition).

So, the normalization decomposes a relation in a set of relations that are in a so-called Normal Form, and the normal forms used in a decomposition are usually the Boyce-Codd Normal Form, the Third Normal Form, or even higher level ones. For these decompositions, there are formal algorithms that can be used, and that guarantees a set of properties, like being lossless and preserving the dependency (not in all cases, for instance it is known that certain relations cannot be decomposed in BCNF without the loss of some dependency).

Note that the second normal form is not used in practice, has only an historical interest, and there are no published formal algorithm since it is more weaker that the 3NF and the BCNF.

All these topics are discussed in detail in any good book on databases, and you can find even some of them free on line.

In your example, assuming that the dependencies that you shown are a cover of all the dependencies of the relation, the decomposition of R in the the relations R1(CE) and R2(ABD) does not preserves the functional dependencies (E->A and C->B are lost), and is not lossless.

A decomposition which preserves data and dependencies, and that can be found both with the synthesis algorithm to produce the 3NF, and with the analysis algorithm to produce the BCNF is the following (each relation schema is followed by a cover of the dependencies holding in it):

R1(A B D) {AB -> D}

R2(B C) {C -> B}

R3(A E) {E -> A}

R4(C E) {}

where each schema is in BCNF.

Of course, since a schema in BCNF is also in 2NF, the previous decomposition also respects the Second Normal Form. This answers to your original question: since each relation can be normalized at least in 3NF without loss of data or dependencies, each relation can be normalized at least in 2NF.