Anomalous Updates in Normalized Database

normalization

I've been trying to understand the three first normal forms of databases and seem to be missing something:

Imagine a database with one table consisting of dances. There are 5 attributes:

A Dance#, MaleDancerName, MaleDancerDateofBirth, FemaleDancerName, FemaleDancerDateofBirth

Names of different people can be identical, but we assume no people have the same name and the same birthdate. The same pair of dancers can dance several
times.

A sample database could look like this:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+        +                +               +                  +             +
+ Dance# + MaleDancerName + MaleDancerDoB + FemaleDancerName +  FemaleDoB  +
+        +                +               +                  +             +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*        *                *               *                  *             *
*    1   *      Brown     *    3/7/1989   *     Cortez       *   5/2/1983  *
*        *                *               *                  *             *
****************************************************************************
*        *                *               *                  *             *
*    2   *      Howard    *    7/5/1978   *     Taylor       *   8/12/1990 *
*        *                *               *                  *             *
****************************************************************************
*        *                *               *                  *             *
*    3   *      Brown     *    1/4/1986   *     Taylor       *   8/12/1990 *
*        *                *               *                  *             *
****************************************************************************
*        *                *               *                  *             *
*    4   *      Meyer     *    2/1/1984   *     Andrews      *  11/10/1988 *
*        *                *               *                  *             *
****************************************************************************
*        *                *               *                  *             *
*    5   *      Brown     *    3/7/1989   *     Cortez       *   5/2/1983  *
*        *                *               *                  *             *
****************************************************************************

Dance# is the primary key – and the only candidate key.

Now, as far as I can tell, this database is in the first three normal forms:

1. Normal Form: Yes: The data is atomic.

2. Normal Form: Yes: No non-prime attribute is dependent on any proper subset of any candidate key.
( A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table. There is only one candidate key, Dance#, and it doesn't have a subset)

3. Normal Form: Yes: All the attributes in the table are determined only by the candidate keys (there is only one of
them, Dance#) and not by any non-prime attributes. The name of the dancer does not determine the Date of Birth – e.g. Brown in Dance1 and Brown in Dance3 are not the same person, they have different birth dates.

Now, although this table is in all three normal forms, it has update anomalies: If we find out that Dancer Taylor was actually born on 9/12/1990, we have to correct this information in Dance#2 and Dance#3.

An obvious solution would be to split the table into two tables, one for dancers and one for dances.

However, just looking at the Normal Forms, the table seems to be okay.

Am I mistaken somewhere?

Best Answer

Normalization is the formal process for removing redundancy from relations by taking projections which when joined back form the original relational and thus eliminate some redundancy without data loss. It is the science underlying database design. The first three normal forms, and BCNF, deal specifically with eliminating redundancy due by ensuring that every non-trivial functional dependency is fully dependent only on candidate keys. Higher normal forms deal with other kinds of dependencies to further eliminate redundancies. Even when fully normalized (5NF is generally considered the "final" normal form although there are four others in the literature) redundancy can still remain as not all redundancies can be removed by taking projections.

Another tool to address eliminating redundancy is the principle of orthogonal design which states that two distinct relvars cannot have in them a tuple with the property that if it appears in the first relvar it must also appear in the second and vice versa. But this principle only addresses redundancy across relvars whereas normalization addresses redundancy within them so it doesn't help with your example.

Ultimately Date contends we just need more science to guide database design as that which we have today as you show isn't quite enough. One practical point to your example is that although there is redundancy, at least it can be controlled redundancy if a table is defined to hold the dancers, all key, with name and birth date. Then, name and birth date become a foreign key to the dances table, and that foreign key can be defined to cascade updates. Then, if a particular dancer's birthdate is found to be in error and corrected, the DBMS will automatically handle updating all the places in the dance table where that dancer was listed. Moving the control of the redundancy from the user to the system is a big step forward that you can get with today's SQL DBMS'.

All of this information is paraphrased from Date's excellent book Database Design and Relational Theory which provides a significant amount of thinking and detail around just this issue. It is indeed the case that we stand on the shoulders of giants.