Database Design – How to Normalize a Very Small Database

normalization

I am designing a very small database (c. 250 rows). I have a number of datapoints for each record, which under normalization rules, ought to be in separate tables.

For instance:

Percentages of a number of different categories (which should add up to 100)
Multiple sources of data (for instance a record may come from up to 4 sources of data – the reference within these data sources should be recorded
Columns with very different units for instance I want to store residential units and Gross Floor area delivered (different units).

All these fields are optional, none are mandatory.

I am considering not normalizing in case 1, because I want a data integrity check that all the relevant columns in each database row add up to 100%.

For the other two cases, it seems that it is a lot of work to normalise for not a lot of gain. For 3, I would have to design 2 additional tables, one to store the data:

| ID(prim key) | project(FK)          | type(int) | value(float) |
|--------------|----------------------|-----------|--------------|
| 1            |  1                   | 1         | 200.3        |

and another to store the column metadata:

| ID(prim key) | typeID(FK)  | Description |
|--------------|-------------|-------------|
| 1            | 1           |    GFA(m2)  |

rather than just do this in the main table:

| RecordID(prim key) | GFA(m2) | Residential units | .... |
|--------------------|---------|-------------------|------|
| 1                  | 200     | 25                | .... |

With such small data, my main concern is updating things if the schema changes, but I can't think of a reason that will be harder with the de-normalised data.#

Can anyone give me a good reason to normalise for cases 2 and 3?

Thanks

Best Answer

For a small database with just a few rows, you might say that the design is whatever you can maintain and sustain. However, a few things to think about include:

Remember that the first implementation is not the future implementation. Planning ahead just a little can save headaches in the future.
If the database should scale up in the future you might find yourself refactoring tables and code.
Since in Table3 the columns are named GFA(m2), Residential units, et cetera for whatever you are measuring you, you are committing to changing that table repeatedly as new measures are added.

By the way, I note that your values(float) of 200.3 becomes 200 in the final table. (Maybe just a typo.) But be sure to keep the data types of the columns defined to fit the data you are storing.

And, yes, if I was delivering this to someone I would normalize it further.

Related Solutions

Can somebody help me normalise this database/check the workings

You need a one-to-many table to list the types of pets that a person has, like

create table people_pet_types (
  person_name varchar(...) references people(name),
  pet_type varchar(...),

  primary key (person_name, pet_type)
);

Assuming name is the primary key of People (which it shouldn't be, else you can't have two people with the same name).

This allows a person to have many pets types.

Also, "Age" and "Alive" should probably be derived from date_of_birth and date_of_death (nullable). Otherwise, you need to update the table to re-calculate the age at least every year. You can use a VIEW to calculate the age, and alive values for you.

MySQL Database Design – Best Fit for Database Normalization

Having the same types of columns is not the same thing as being the same entity type. Normalization is not the same thing as code reuse.

Normalization is about arranging your columns into tables in such a way that you avoid insert, update and delete anomalies. It is about reducing the kinds of redundancy that can lead to these anomalies, it isn't about putting things together because they look alike.

As a first choice, you should let your database engine use its declarative constraints (foreign keys, unique keys, etc.) to protect the integrity of your data. This saves writing application code to do this and makes your system less buggy in the long run.

This means that you should aim for at least third normal form (3NF) by default and then consider later whether any denormalization is necessary.

From this perspective, your first option is not good, because you are jamming three different kinds of child data into one table. You can't control referential integrity as cleanly or easily this way. The only two conventional, practical options you have are 2 and 3.

Choosing between options 2 and 3 depends on the circumstances, and may depend quite a bit on your personal preferences. Some people would avoid option 2 because they believe all nulls are evil. Other people would avoid option 3 because they don't like having too many different tables. This is a "holy war" debate without a definitive, logical answer.

For practical purposes, looking at your three main tables, I would say that they don't look like they have very much in common. There is a little overlap, but not very much. Therefore, I would tend towards option 3. Keep your tables separate so that your code and your data stay clean.

As an aside: You don't show the details of the child table(s). If these three child tables have an identical structure and you're worried about having application code that is duplicated, consider creating a class that handles the child table structure and inheriting from this class for each of the three child tables. This will give you code reuse without compromising data integrity.

Best Answer

Related Solutions

Can somebody help me normalise this database/check the workings

MySQL Database Design – Best Fit for Database Normalization

Related Question