Mysql – Database normalization: Which is the best fit

database-designMySQLnormalization

I have 3 different types of data set:

Notification:

  • start_date
  • end_date
  • notification_type
  • country_id
  • state_id

Campaign:

  • start_date
  • end_date
  • user_notification

Transaction

  • email
  • transaction_value

All these 3 have the same types of child tables where there will be foreign keys pointing to these tables.

What type of design will be best?

  1. 3 different tables and one child table that will have 3 different columns as primary keys pointing to each different parent?
  2. 1 combined parent table allowing null values for columns which will be not needed for any particular type and a main_type column which says whether each record is a campaign, a notification or a transaction. The child table will have only one foreign key column.
  3. 3 different tables, 1 for each of the main table types with a child table for each main table that uses the main table's primary key as a foreign key.

Best Answer

Having the same types of columns is not the same thing as being the same entity type. Normalization is not the same thing as code reuse.

Normalization is about arranging your columns into tables in such a way that you avoid insert, update and delete anomalies. It is about reducing the kinds of redundancy that can lead to these anomalies, it isn't about putting things together because they look alike.

As a first choice, you should let your database engine use its declarative constraints (foreign keys, unique keys, etc.) to protect the integrity of your data. This saves writing application code to do this and makes your system less buggy in the long run.

This means that you should aim for at least third normal form (3NF) by default and then consider later whether any denormalization is necessary.

From this perspective, your first option is not good, because you are jamming three different kinds of child data into one table. You can't control referential integrity as cleanly or easily this way. The only two conventional, practical options you have are 2 and 3.

Choosing between options 2 and 3 depends on the circumstances, and may depend quite a bit on your personal preferences. Some people would avoid option 2 because they believe all nulls are evil. Other people would avoid option 3 because they don't like having too many different tables. This is a "holy war" debate without a definitive, logical answer.

For practical purposes, looking at your three main tables, I would say that they don't look like they have very much in common. There is a little overlap, but not very much. Therefore, I would tend towards option 3. Keep your tables separate so that your code and your data stay clean.

As an aside: You don't show the details of the child table(s). If these three child tables have an identical structure and you're worried about having application code that is duplicated, consider creating a class that handles the child table structure and inheriting from this class for each of the three child tables. This will give you code reuse without compromising data integrity.