Confusion normalizing data to 3NF and transitive dependencies

database-designnormalization

I am currently in a Database class and have some confusion with normalizing to 3NF. I have one specific issue that was not addressed in class but appears in the book problems.

For example I have

R (ABCDEFGH)

AB-->D
B-->C
B-->E
A-->H
H-->G

I have two issues with the problem. First, when decomposing the transitive dependencies of B, should they be split into two tables of BC and BE or a single table BCE.

Second is how do I decompose H-->G. When I remove the partial dependency of A-->H into a separate table AH, the dependency H-->G no longer exists because I already removed H to put it into a separate table.

This is my solution to breaking this down.

R (ABCDEFGH)
R1 (ABCEFGH)  R1(ABD)
R2 (ABEFGH) R2(BC)
R3 (ABDFGH) R3(BE)
R4 (ABDFG) R4(AH)

I have no idea whether I am loosing the H-->G FD by breaking it down like this. However following the steps I had to break apart all the partial FDs before breaking down the transitive one.

Any help would be appreciated on my two issues.

Best Answer

You have:

R (ABCDEFGH)

AB-->D
B-->C
B-->E
A-->H
H-->G

I'm not sure how you resulted in the decompositions but the first thing to do usually is to find the candidate keys if the relations. Here there is only one, the ABF.

Then you identify that the relation is not in 2NF, as there are dependencies on parts of candidate keys and not the whole of them (this part seems to have been done it quite well.) The partial dependencies (on parts of the candidate keys) are these and you only missed the last one, that can be derived from the others:

AB-->D
B-->C
B-->E
A-->H
A-->G             -- missed this one

So (leaving the first question for the end), the decompositions to:

R (ABCDEFGH) 
                          -- splitting because of the AB -> D, fine
R*(ABCEFGH)  R1 (ABD)
                          -- splitting because of the B -> C, fine
R**(ABEFGH)  R2 (BC)
                          -- splitting because of the B -> E, fine
R***(ABFGH)  R3 (BE)

are rather fine, except some minor details like the D reappearing in the R relation.

The next part which is your second question regarding the A->H and H-G was problematic due to your missing of the implied A->G dependency. It should be:

                          -- splitting because of the A -> HG
R****(ABF)   R4 (AHG)

We have now resulted in 5 relations, all in 2NF:

R**** (ABF)
R1 (ABD)      -- AB-> D
R2 (BC)       -- B -> C
R3 (BE)       -- B -> E
R4 (AHG)      -- A -> HG ,  H -> G

In fact all are in 3NF except for R4. So, we have to split that into two relations:

R4* (AH)      -- A -> H
R5 (HG)       -- H -> G

We now have 6 relations (R****, R1, R2, R3, R4*, R5) which are all in 3NF and BCNF as well. In fact they are all in 4, 5 and 6NF.

Regarding your first question, you could have used one split into (BCE) instead of the two (BC) and (BE) and it would have been perfectly fine. Your final result would be five relations (instead of six), all in 3NF - and all the way up to BCNF, 4NF, 5NF, 6NF (except for that (BCE) which is in 5NF.)