When should I extract a field into a new table

database-design

Given a table :

CAR
---
id
model_id
condition
fuel_id
release_year

Since many cars would have the same Model and same Fuel, there are 1:N tables created.

Though, many Car objects can have the same release_year.

Should I create a Year table and then a year_id field into Car?

Design-wise, it seems accurate, but it seems weird to create a Year table containing a list of years.

Still, this is actually what I do for Fuel : a table containing a list of fuels types.

When is it accurate to create a new 1:N table instead of a field ?

Best Answer

It depends... as a lawyer might say The columns in your car table might seem to be data that will not change but let's take a closer look:

  • condition: this is definitely a temporal fact. Today the car is new, drive it off the lot and it is used, wait long enough and it is an antique. You might have to allow for the possibility that the program will store data for the same vehicle more than once during it's lifetime. Lease it as new, get it back as used, sell it....
  • release year: will an auto manufacturer always officially release a car on a discrete date? Or is it possible that it will be shown at an auto show this year, released in Europe next year and released in North America a year after that?

My experience with data modelling is that the most common mistake is to assume that a fact like a year or a condition will never change when it is really slowly changing data.

For your purposes if there is any possibility that a release year could have more than one value or span over a year end or be amended by the manufacturer then you could have a release_year_id in CAR which is a foreign key to the RELEASE_YEAR table which would have:

ID NUMBER(10)
YEAR_START  DATE
YEAR_END    DATE
GEO_AREA    NUMBER(10)
CURRENT_VALUE  NUMBER(1) and constrained to value of 1 or 0

The current value field would be implemented if there is the possibility of the start of the release year or end of the release year changing and you wanted to record this. Further information is described temporal databases.

Your final reality check is to create some sample reporting queries and find out how much it hurts. This will give you a good indication of whether to normalize or denormalize

As Jeff Atwood wrote

normalize until it hurts, denormalize until it works