Mysql – Database normalization: Which is the best fit

database-designMySQLnormalization

I have 3 different types of data set:

Notification:

start_date
end_date
notification_type
country_id
state_id

Campaign:

start_date
end_date
user_notification

Transaction

email
transaction_value

All these 3 have the same types of child tables where there will be foreign keys pointing to these tables.

What type of design will be best?

3 different tables and one child table that will have 3 different columns as primary keys pointing to each different parent?
1 combined parent table allowing null values for columns which will be not needed for any particular type and a main_type column which says whether each record is a campaign, a notification or a transaction. The child table will have only one foreign key column.
3 different tables, 1 for each of the main table types with a child table for each main table that uses the main table's primary key as a foreign key.

Best Answer

Having the same types of columns is not the same thing as being the same entity type. Normalization is not the same thing as code reuse.

Normalization is about arranging your columns into tables in such a way that you avoid insert, update and delete anomalies. It is about reducing the kinds of redundancy that can lead to these anomalies, it isn't about putting things together because they look alike.

As a first choice, you should let your database engine use its declarative constraints (foreign keys, unique keys, etc.) to protect the integrity of your data. This saves writing application code to do this and makes your system less buggy in the long run.

This means that you should aim for at least third normal form (3NF) by default and then consider later whether any denormalization is necessary.

From this perspective, your first option is not good, because you are jamming three different kinds of child data into one table. You can't control referential integrity as cleanly or easily this way. The only two conventional, practical options you have are 2 and 3.

Choosing between options 2 and 3 depends on the circumstances, and may depend quite a bit on your personal preferences. Some people would avoid option 2 because they believe all nulls are evil. Other people would avoid option 3 because they don't like having too many different tables. This is a "holy war" debate without a definitive, logical answer.

For practical purposes, looking at your three main tables, I would say that they don't look like they have very much in common. There is a little overlap, but not very much. Therefore, I would tend towards option 3. Keep your tables separate so that your code and your data stay clean.

As an aside: You don't show the details of the child table(s). If these three child tables have an identical structure and you're worried about having application code that is duplicated, consider creating a class that handles the child table structure and inheriting from this class for each of the three child tables. This will give you code reuse without compromising data integrity.

Disclaimer

This is experimental and only tested rudimentarily. Proceed at your own risk. I would not use it myself and just drop / recreate constraints with standard DDL commands. If you break entries in the catalog tables you could easily mess up your database.

For all I know, there are only two differences between a PRIMARY KEY and a UNIQUE constraint in the catalog tables (the index itself is identical):

pg_index.indisprimary:
For PRIMARY KEY constraint ... TRUE
For UNIQUE constraint ... FALSE

pg_constraint.contype:
PRIMARY KEY constraint ... 'p'
UNIQUE constraint ... 'u'

You could convert constraint and index in place, from PRIMARY KEY constraint to UNIQUE constraint, my_idx being the (optionally schema-qualified) index name:

UPDATE pg_index SET indisprimary = FALSE WHERE indexrelid = 'my_idx'::regclass
UPDATE pg_constraint SET contype = 'u' WHERE conindid = 'my_idx'::regclass;

Or upgrade from UNIQUE to PRIMARY KEY:

UPDATE pg_index SET indisprimary = TRUE WHERE indexrelid = 'my_idx'::regclass;
UPDATE pg_constraint SET contype = 'p' WHERE conindid = 'my_idx'::regclass;

Sql-server – How to represent class table inheritance (current DBMS-specific way please)

Ultimately, I implemented ypercube's suggestion from comments:

The "type" column can be defined as a computed (but constant) column in SQL-Server. I think it has to be PERSISTED though so it can participate in the Foreign Key constraint.

This worked well both for performance and compatibility with with my tools (Entity Framework <= 6.0):

CREATE TABLE [dbo].[Account](
    [Id] [int] IDENTITY(1000,1) NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole] [int] NOT NULL,
    [FirstName] [nvarchar](50) NOT NULL,
    [LastName] [nvarchar](50) NOT NULL,
CONSTRAINT [UX_Derived_Relation] UNIQUE ([Id], [CommunityRole]))

CREATE TABLE [dbo].[Recruiter](
    [Id] [int] NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole]  AS ((1)) PERSISTED NOT NULL,
    [RecruiterSpecificValue]  [int] NOT NULL,
FOREIGN KEY ([Id], [CommunityRole]) REFERENCES Account([Id], [CommunityRole]))

CREATE TABLE [dbo].[Candidate](
    [Id] [int] NOT NULL PRIMARY KEY CLUSTERED,
    [CommunityRole]  AS ((2)) PERSISTED NOT NULL,
    [CandidateSpecificValue]  [int] NOT NULL,
FOREIGN KEY ([Id], [CommunityRole]) REFERENCES Account([Id], [CommunityRole]))

This mapped well to my implementation of multiple discrete account types, Recruiter and Candidate, on my Job Board.

Notification:

Campaign:

Transaction

Best Answer

Related Solutions

Postgresql – What happens to the index of a primary key after a DROP CONSTRAINT

Disclaimer

Sql-server – How to represent class table inheritance (current DBMS-specific way please)

Related Question