Database Design – Making Database Tables 1NF

database-designnormalizationrelational-theory

I am creating a database for a DVD rental shop, I have various entities that are related to this question, such as Film, FilmStar.

For each film, you record its unique number, title, the year in which it was made, its category (action adventure, science fiction, horror, romance, comedy, classic, children's), its director, and all stars that appeared in it. For each film, you also want to store the type of DVD hire (new release, classics, other).

I am mostly unsure about "all the stars that appeared in it". I first thought just having an attribute in the 'Film' Entity, for example filmStar and then each star would be inserted into that attribute, for example: "John Doe, Jane Doe" for each film. But then I realised that this wouldn't be 1NF as : "the domains of attributes must include only atomic values, the value of an attribute must be a single value from the domain of that attribute", as it contains more than one value and isn't atomic.

I then thought about having a separate entity that contains certain attributes such as: filmID, filmStarID. So John Doe would have the filmStarID of '0001' (all of this would be in the FilmStar entity, which is a separate entity). But then the same problem would occur, for example the filmID attribute would have all of the filmID's that the filmStar has starred in, for example: John Doe would have "101, 115, 009". Which again wouldn't be 1NF.

I was just wondering what your thoughts are on this?

Best Answer

You're on the right track with splitting up your domains and using a linking entity. Separating the Films from the Stars is absolutely the first step.

 ---------------------------    -------------------------
| FilmID |     FilmName     |  | StarID |   StarName     |
|--------+------------------|  |--------+----------------|
| EVIL01 | The Evil Dead    |  | 000001 | Bruce Campbell |
| EVIL02 | Evil Dead II     |  | 000002 | Sarah Berry    |
| EVIL03 | Army of Darkness |  | 000003 | Marcus Gilbert |
 ---------------------------    -------------------------

The missing piece is tying the two together. Obviously, stuffing the StarIDs into the Film table is in violation of the first normal form, since you end up needing to store multiple references to actors in a single row.

 --------------------------------------------
| FilmID |     FilmName     |     StarIDs    |
|--------+------------------+----------------|
| EVIL01 | The Evil Dead    | 000001         | 
| EVIL02 | Evil Dead II     | 000001, 000002 |   Wrong,
| EVIL03 | Army of Darkness | 000001, 000003 |   Wrong
 --------------------------------------------

Splitting the offending column out to its own table restores the integrity of the original Film table, but continuing to store multiple references in a single attribute still invalidates 1NF for the data model.

 ---------------------------    -------------------------
| FilmID |     FilmName     |  | FilmID |     StarIDs    |
|--------+------------------|  |--------+----------------|
| EVIL01 | The Evil Dead    |  | EVIL01 | 000001         |
| EVIL02 | Evil Dead II     |  | EVIL02 | 000001, 000002 |   Still
| EVIL03 | Army of Darkness |  | EVIL03 | 000001, 000003 |   Wrong
 ---------------------------    -------------------------

To rectify this is fairly simple, as you merely need to associate each FilmID with the respective StarID, bringing your model in-line with the tenants of 1NF.

 ---------------------------    -----------------    -------------------------
| FilmID |     FilmName     |  | FilmID | StarID |  | StarID |   StarName     |
|--------+------------------|  |--------+--------|  |--------+----------------|
| EVIL01 | The Evil Dead    |  | EVIL01 | 000001 |  | 000001 | Bruce Campbell |
| EVIL02 | Evil Dead II     |  | EVIL02 | 000001 |  | 000002 | Sarah Berry    |
| EVIL03 | Army of Darkness |  | EVIL02 | 000002 |  | 000003 | Marcus Gilbert |
 ---------------------------   | EVIL03 | 000001 |   -------------------------
                               | EVIL03 | 000003 |
                                -----------------

Now, with the the data normalized in this manner, it is easy to add attributes to expand the model. Say an actor's character name in a movie was desirable, you could add CharacterName to the FilmStar table - maybe useful for getting a list of everybody who's ever played "James Bond," for instance.