Losslessly decompose this table

best practicesdatabase-designnormalizationrelational-theoryschema

I've stumbled across a database design problem that's out of my league, and my go-to DBA guru is off on fire drills.

In essence, I have a table with the following primary key (PK for brevity):

child_id   integer
parent_id  integer
date       datetime

child_id and parent_id are foreign keys to entity tables. The "child" table itself also contains a foreign key to the "parent" table, and lo, each child_id always references the same parent_id as expected by the table above. In fact, turns out there's some extra code keeping the two in sync.

Which makes this overenthusiastic normalization novice say "I should remove the redundancy instead!"

I decompose to the following:

Table_1 PK:
child_id   integer
date       datetime

Table_2 PK:
parent_id  integer
date       datetime

Table_3: (already exists)
child_id   integer PRIMARY KEY
parent_id  integer FOREIGN KEY

And lo, when I join these guys together in the natural way, I recover the original table. It's my understanding that makes this 5NF.

However, now I realize there's a hidden business rule.

Normally, the dates associated with a given child_id must be a subset of the dates associated with the corresponding parent_id. You can see that the first table enforces this rule.

My decomposition does not enforce the rule, because you can freely add to Table 1 until the dates get too large.

Which leads me here, with the following questions:

  1. Is this decomposition 5NF? While I'd say it permits insertion anomalies, it also seems to follow the Wiki example, which itself follows this guide. The phrase (emphasis mine) "we can reconstruct all the true facts from a normalized form consisting of three separate record types" gives me special pause, since no matter how much garbage I pump into Table_1, the natural join still ignores it.

  2. Supposing I don't like this decomposition (I don't). I freely acknowledge that the practical solution is to leave the table and code as they are. But, in theory, is there a way to decompose and/or add constraints such that I get away from the first table and preserve my business rules?

Best Answer

Normalization is based on functional dependencies. Functional dependencies have to do with semantics; they have to do with what the data means. When you simplify a real-world problem to the level of "parent_id, child_id, date", and you don't include any sample data, you really limit how much help a conscientious database designer can give you.

The fact that you have a key {child_id, parent_id, date} in one table, and that you have (it seems) a unique pair {child_id, parent_id} in the child table doesn't necessarily mean that part of the combination is redundant. It might mean that in the table that has {child_id, parent_id, date} as the primary key, the pair of attributes {child_id, parent_id} ought to reference the child table in the first place.

If that's the case, you might use FOREIGN KEY (child_id, parent_id) REFERENCES child (child_id, parent_id). To do that, you need a UNIQUE constraint on the pair of columns (child_id, parent_id) in the table "child", which shouldn't be a problem if child_id is its primary key.

But there's no way to tell without knowing what the data means, and you're the only one in this thread who knows that. (But we'd be happy to let you explain it to us.)

As far as the original table is concerned, you seem to be saying that child_id -> parent_id. If that's the case, why is parent_id in the original table in the first place? Why isn't the key just (child_id, date), with a foreign key reference to the "child" table? It looks to me like the kind of redundancy you're talking about might be resolved by dropping the column "parent_id".

SQL DDL and sample data in the form of INSERT statements helps us help you. DDL and INSERT statements are more precise than descriptions.