Database Normalization – Ensure No Data Loss When Splitting Tables

denormalizationnormalization

This is w.r.t Database Normalisation, when I have a redundant table that causes anomalies, I would want to consider splitting it in smaller tables as per the NF rules.

BUT

Say I split a big table into two eg. Student and Marks Table my questions is —

How do I know I've done the right split?

Or to put it in another way,

How can I cross check if I haven't lost any information after the split i.e. I can still do a join on both tables and get my un-normalized table back?

This is easy to inspect visually when you have a small number of attributes, but what if the table is really huge.

How can I ensure that I've split the table in a right manner?

When I was in college my professor spoke of some equation which must always be true whenever you split tables, that ensures that the split is done without loss of any information. And by doing the right joins we can get the big dirty redundant table back.

Best Answer

A classical theorem in normalization theory says that you have a lossless join (that is a decomposition of a relation such that when you perform the natural join of the decomposed relations you obtain the original relation) in the case of two relations if and only if the intersection of the attributes of the relations contains a candidate key for one of the two relations.

This result can be generalized to a decomposition with any number of relations.