Database Design – To Normalize or Not to Normalize

database-designnormalizationpostgresql

In general, I understand that normalization is usually beneficial even with the join costs. However, I came up with an interesting dilemma recently.

What if the data is duplicated but unlikely to ever change. It's possible, but I would not anticipate it.

I have a nutrients table with a unit column and the units would be g, kg, ug, etc.

I can't see these values every changing.

I'm tempted to just put them as a column in the table rather than normalizing and having a units table and using a foreign key and having to join whenever fetching a row from the nutrients table. At the same time, I know in general, even with the join coins, we should normalize.

What should I do (and why)?

Best Answer

There is a difference between redundant data (bad) and coincidentally repeated data (not bad). Normalization is a technique which is used to avoid insert, update and delete anomalies. It is not meant to eliminate every repetition of a piece of data. Data which is static doesn't benefit from normalization.

A unit of measure, stated as a standardized abbreviation is not the kind of data that you need to normalize out.

Think of it this way: If you were to normalize out your unit of measure into a separate table, you'd need a foreign key from your nutrients table to your units table. Is the unit of measure code going to be unique (probably, yes). Therefore it's a candidate key for your units table. If it's a candidate key in units you could use it as a foreign key in nutrients.

The end result is that you have your unit of measure code in your nutrients table anyway even if you've normalized out the units.

Here's when you would want to create a units table. If you have other predicates (columns) that are dependent on the unit code, but not on the nutrient. For example, a unit_type or a conversion factor to a base unit of the same type (grams for weight, etc.) This would be a transitive functional dependency in your nutrients table and doesn't belong there for that reason.

Related Solutions

Mysql – Maximum normalization

No, you should not include a column make_id in the car_model table, it is implicitly defined by the series_id. If you need to see the make details you could make a view that looks like your un-normalized table.

No this design cannot be normalised further.

To force the year ranges to be non-overlapping you could add a trigger (as MySql does not support check constraints) to ensure that start_year >= max(end_year) and end_year >= start_year.

The trigger would look something like this:

create trigger trg_car_model_date_range_unique before insert on car_model
for each row
begin
    if new.start_year < (select max(end_year) from car_model) then
        signal sqlstate '45000' set message_text = 'Date range is not unique';
    end if;
    if new.end_year < new.start_year then
        signal sqlstate '45000' set message_text = 'Date range is invalid';
    end if;
end

Database Design – How to Normalize a Very Small Database

For a small database with just a few rows, you might say that the design is whatever you can maintain and sustain. However, a few things to think about include:

Remember that the first implementation is not the future implementation. Planning ahead just a little can save headaches in the future.
If the database should scale up in the future you might find yourself refactoring tables and code.
Since in Table3 the columns are named GFA(m2), Residential units, et cetera for whatever you are measuring you, you are committing to changing that table repeatedly as new measures are added.

By the way, I note that your values(float) of 200.3 becomes 200 in the final table. (Maybe just a typo.) But be sure to keep the data types of the columns defined to fit the data you are storing.

And, yes, if I was delivering this to someone I would normalize it further.

Best Answer

Related Solutions

Mysql – Maximum normalization

Database Design – How to Normalize a Very Small Database

Related Question