Database Design – How Normalization Fixes Update Anomalies

database-designnormalizationrelational-theory

I have been reading 'Database Systems: A practical approach to design, implementation and management' by Connolly and Begg in order to understand database normalization (chapter 14). I now better understand the three database normalization forms:

  • First normal form (1NF)
  • Second normal form (2NF)
  • Third normal form (3NF)

I also understand the three update anomalies:

  • Insertion anomaly
  • Deletion anomaly
  • Update / Modification anomaly

What I am struggling to understand now is how to link the two together. For example, which of the different forms help(s) to fix the insertion anomaly? Similarly for the other anomalies. I would ideally like to know the mapping relationship between the two groups and why those forms fix the certain anomaly.

This is my first question on here and I searched on Google and this site to try and find an answer first, but to no avail.

Thank you.

Best Answer

1NF is basically just "don't keep too much data in a single column", so I think that 2NF and 3NF are the primary fix for all 3 database anomalies, since both 2NF and 3NF involve breaking out items into their own tables:

  1. Insertion anomaly: If you have one big enrollment table that includes both "class" and "student" data (neither of which exists elsewhere), then you can't enter a new (empty) course without at least one corresponding student (because the table is a record of your enrollments). So, apply 2NF and create separate tables for classes, students, and make your original enrollment table link to both by ClassID and StudentID. Now you can enter new classes with no students, and new students with no classes.

  2. Deletion anomaly: Same as above, if each row of your original enrollment table contains the full details of the student and the full details of the class they are enrolled in, then removing the last enrolled student for a class removes the last bit of information about that class. The solution is the same, apply 2NF to make separate tables, so that students can be enrolled or unenrolled without losing class information.

  3. Update anomaly: Same as above, using the single-table method, updating information (say, the room number) for a class with multiple students enrolled might lead to a situation where some rows have been new information and other rows have the old. Applying 2NF as above is again the solution, so that class data is changed in only one place (the classes table).

Note that 1NF still plays some role in your process, as it means you can't try to solve the problem by cramming an entire list of enrolled students into a single field, or add student1, student2, student3 fields or something like that.

We could come up with similar examples where the factor in play is 3NF instead of 2NF: If each "student" has a faculty advisor (and some advisors are assigned to multiple students), that might not be part of the key for our student table, but it is "dependent attribute", and could lead to some of the same problems as above. So faculty advisor could be broken out into its own table.

A few resources I found helpful: