Anomalous Updates in Normalized Database

normalization

I've been trying to understand the three first normal forms of databases and seem to be missing something:

Imagine a database with one table consisting of dances. There are 5 attributes:

A Dance#, MaleDancerName, MaleDancerDateofBirth, FemaleDancerName, FemaleDancerDateofBirth

Names of different people can be identical, but we assume no people have the same name and the same birthdate. The same pair of dancers can dance several
times.

A sample database could look like this:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+        +                +               +                  +             +
+ Dance# + MaleDancerName + MaleDancerDoB + FemaleDancerName +  FemaleDoB  +
+        +                +               +                  +             +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*        *                *               *                  *             *
*    1   *      Brown     *    3/7/1989   *     Cortez       *   5/2/1983  *
*        *                *               *                  *             *
****************************************************************************
*        *                *               *                  *             *
*    2   *      Howard    *    7/5/1978   *     Taylor       *   8/12/1990 *
*        *                *               *                  *             *
****************************************************************************
*        *                *               *                  *             *
*    3   *      Brown     *    1/4/1986   *     Taylor       *   8/12/1990 *
*        *                *               *                  *             *
****************************************************************************
*        *                *               *                  *             *
*    4   *      Meyer     *    2/1/1984   *     Andrews      *  11/10/1988 *
*        *                *               *                  *             *
****************************************************************************
*        *                *               *                  *             *
*    5   *      Brown     *    3/7/1989   *     Cortez       *   5/2/1983  *
*        *                *               *                  *             *
****************************************************************************

Dance# is the primary key – and the only candidate key.

Now, as far as I can tell, this database is in the first three normal forms:

1. Normal Form: Yes: The data is atomic.

2. Normal Form: Yes: No non-prime attribute is dependent on any proper subset of any candidate key.
( A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table. There is only one candidate key, Dance#, and it doesn't have a subset)

3. Normal Form: Yes: All the attributes in the table are determined only by the candidate keys (there is only one of
them, Dance#) and not by any non-prime attributes. The name of the dancer does not determine the Date of Birth – e.g. Brown in Dance1 and Brown in Dance3 are not the same person, they have different birth dates.

Now, although this table is in all three normal forms, it has update anomalies: If we find out that Dancer Taylor was actually born on 9/12/1990, we have to correct this information in Dance#2 and Dance#3.

An obvious solution would be to split the table into two tables, one for dancers and one for dances.

However, just looking at the Normal Forms, the table seems to be okay.

Am I mistaken somewhere?

Best Answer

Normalization is the formal process for removing redundancy from relations by taking projections which when joined back form the original relational and thus eliminate some redundancy without data loss. It is the science underlying database design. The first three normal forms, and BCNF, deal specifically with eliminating redundancy due by ensuring that every non-trivial functional dependency is fully dependent only on candidate keys. Higher normal forms deal with other kinds of dependencies to further eliminate redundancies. Even when fully normalized (5NF is generally considered the "final" normal form although there are four others in the literature) redundancy can still remain as not all redundancies can be removed by taking projections.

Another tool to address eliminating redundancy is the principle of orthogonal design which states that two distinct relvars cannot have in them a tuple with the property that if it appears in the first relvar it must also appear in the second and vice versa. But this principle only addresses redundancy across relvars whereas normalization addresses redundancy within them so it doesn't help with your example.

Ultimately Date contends we just need more science to guide database design as that which we have today as you show isn't quite enough. One practical point to your example is that although there is redundancy, at least it can be controlled redundancy if a table is defined to hold the dancers, all key, with name and birth date. Then, name and birth date become a foreign key to the dances table, and that foreign key can be defined to cascade updates. Then, if a particular dancer's birthdate is found to be in error and corrected, the DBMS will automatically handle updating all the places in the dance table where that dancer was listed. Moving the control of the redundancy from the user to the system is a big step forward that you can get with today's SQL DBMS'.

All of this information is paraphrased from Date's excellent book Database Design and Relational Theory which provides a significant amount of thinking and detail around just this issue. It is indeed the case that we stand on the shoulders of giants.

Related Solutions

Why is this relation in 3NF

The mistake is in your understanding of transitive dependency. From wikipedia: Transitive dependency

In mathematics, a transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive dependency can occur only in a relation that has three or more attributes. Let A, B, and C designate three distinct attributes (or distinct collections of attributes) in the relation. Suppose all three of the following conditions hold:

1. A → B
2. It is not the case that B → A
3. B → C

Then the functional dependency A → C (which follows from 1 and 3 by the axiom of transitivity) is a transitive dependency.

In your case though, (2) does not hold:

1. U → T                                  -- correct
2. It is not the case that T → U          -- wrong
3. T → V                                  -- correct

Therefore {V} is NOT transitively dependent on {U}.

Database Normalization – Does 3NF Require 2NF?

The normal forms are specified in such a way that for any normal form, the model is also meets the criteria for the lower numbered normal forms. So, yes if your model is in 3rd normal form, it is also in 2nd normal form and 1st normal form.

When normalizing a model, you normally work at resolving one normal form before the next. This makes normalization simpler and less error prone as you are only applying a few rules at at time. It is possible that an entity at a lower normal form is also at a higher normal form. Most 3rd normal form entities are also in 4th and 5th normal form.

4th, 5th, and 6th normal forms go beyond the functional dependence. Where applicable, they further simplify the data, at the expense of more tables. They reduce redundancy, and increase data integrity.

Best Answer

Related Solutions

Why is this relation in 3NF

Database Normalization – Does 3NF Require 2NF?

Related Question