In general, I understand that normalization is usually beneficial even with the join costs. However, I came up with an interesting dilemma recently.
What if the data is duplicated but unlikely to ever change. It's possible, but I would not anticipate it.
I have a nutrients
table with a unit
column and the units would be g
, kg
, ug
, etc.
I can't see these values every changing.
I'm tempted to just put them as a column in the table rather than normalizing and having a units
table and using a foreign key and having to join whenever fetching a row from the nutrients
table. At the same time, I know in general, even with the join coins, we should normalize.
What should I do (and why)?
Best Answer
There is a difference between redundant data (bad) and coincidentally repeated data (not bad). Normalization is a technique which is used to avoid insert, update and delete anomalies. It is not meant to eliminate every repetition of a piece of data. Data which is static doesn't benefit from normalization.
A unit of measure, stated as a standardized abbreviation is not the kind of data that you need to normalize out.
Think of it this way: If you were to normalize out your unit of measure into a separate table, you'd need a foreign key from your
nutrients
table to yourunits
table. Is the unit of measure code going to be unique (probably, yes). Therefore it's a candidate key for yourunits
table. If it's a candidate key inunits
you could use it as a foreign key innutrients
.The end result is that you have your unit of measure code in your
nutrients
table anyway even if you've normalized out the units.Here's when you would want to create a
units
table. If you have other predicates (columns) that are dependent on the unit code, but not on the nutrient. For example, aunit_type
or a conversion factor to a base unit of the same type (grams for weight, etc.) This would be a transitive functional dependency in yournutrients
table and doesn't belong there for that reason.