Tool to check if the database is normalized to the third normal form

database-designdatabase-recommendationnormalizationschema

I learned about normalization recently, and understand how important it is when implementing a new schema.

How can I check if my database is 2NF or 3NF compliant ?

Manual review is a sure option, but I'm looking for an automated tool here.

I'm not looking for a point-and-click tool, more something that would highlight possible optimizations to make a table 3NF compliant. I guess it might use statistics based on good sample data and/or semantic analysis of columns names.

Best Answer

Normalization absolutely is used in the real world... and hopefully you know that 3NF is only the third one of... what is is now, 8? But 3NF should be an easy target.

However... I would venture to say that there could not be such a tool.

Normalization, technically, is an attribute of each table. Within a given database, different tables may have different levels of normalization.

Each table represents facts... facts about instances of a certain type of thing (person, account, order, shipment, item, location) including, sometimes, foreign keys which lead you to other kinds of facts about that thing.

Normalization has to do with how accurately and efficiently facts are represented in the tables as well as the ability of table's design to prevent ambiguous and redundant data patterns.

Thus, an understanding of the actual facts is required... which is outside the scope of automated tools.

Q: Is a table with { student, subject, instructor } in 3NF?
A: What are students, subjects and instructors?

In a world where all instructors taught all subjects and each student could take any combination but not more than one course on each subject from each instructor, this table could indeed be said to be in 3NF. In the real world, making the claim of 3NF for this table is absurd.

To understand that is isn't in 3NF requires an understanding of the nature of the facts it represents. In our reality, this table is not going to be 3NF since (among other reasons) the subject and the instructor are associated together in ways that have nothing to do with the student. If we have the courses where instructors teach subjects stored elsewhere in our database, why would we copy both values here instead of a foreign key from the other table indicating that the student was signed up for the course? If the instructor is replaced, we have to change multiple records in multiple places.

The more normalized a database is, the more intrinsically consistent it is with the real world and with itself, and the more difficult it is for the database's facts to be inadvertently untrue. Database design is an art, but it is most definitely a science as well.

Even though I do not see eye-to-eye with everything he writes, I would recommend Chris Date's book, Database Design and Relational Theory: Normal Forms and All That Jazz which goes into excruciating detail about the underlying theory of the relational model.