Multiple foreign keys with shared columns for weak entities

database-designforeign key

this is my first question at stackexchange.

I have a parent table (P) with two child (CC and CD):

CREATE TABLE P
(
    A CHAR(3), B CHAR(2),
    CONSTRAINT PK_P PRIMARY KEY(A, B)
)

CREATE TABLE CC
(
    A CHAR(3), B CHAR(2),
    C CHAR(1),
    CONSTRAINT PK_CC PRIMARY KEY(A, B, C),
    CONSTRAINT FK_CC_P FOREIGN KEY(A, B) REFERENCES P(A, B)
)

CREATE TABLE CD
(
    A CHAR(3), B CHAR(2),
    D CHAR(1),
    CONSTRAINT PK_CD PRIMARY KEY(A, B, D),
    CONSTRAINT FK_CD_P FOREIGN KEY(A, B) REFERENCES P(A, B)
)

I want a table (T) with optional references to CC and CD:

CREATE TABLE T
(
    TId INT AUTO_INCREMENT PRIMARY KEY,

    -- more fields

    CurrentA CHAR(3), CurrentB CHAR(2), 
    CurrentC CHAR(1),
    CurrentD CHAR(1),

    CONSTRAINT FK_T_CurrentCC 
        FOREIGN KEY(CurrentA, CurrentB, CurrentC)
        REFERENCES CC(A, B, C),

    CONSTRAINT FK_T_CurrentCD 
        FOREIGN KEY(CurrentA, CurrentB, CurrentD) 
        REFERENCES CD(A, B, D)
)

So that if both records Current CC and Current CD are present, they must refer to the same parent P record.

This is the first option, but I intuitively think it may create some problems which I'm not really sure about.

I'm also considering a second option:

CREATE TABLE TAlternative
(
    TId INT AUTO_INCREMENT PRIMARY KEY,

    -- more fields

    CurrentCCA CHAR(3), CurrentCCB CHAR(2), CurrentCCC CHAR(1),
    CurrentCDA CHAR(3), CurrentCDB CHAR(2), CurrentCDD CHAR(1),

    CONSTRAINT FK_TAlternative_CurrentCC 
        FOREIGN KEY(CurrentCCA, CurrentCCB, CurrentCCC) 
        REFERENCES CC(A, B, C),

    CONSTRAINT FK_TAlternative_CurrentCD 
        FOREIGN KEY(CurrentCDA, CurrentCDB, CurrentCDD) 
        REFERENCES CD(A, B, D),

    CONSTRAINT CHK_CurrentCC_CurrentDD_Same_P 
        CHECK
        (
            -- Parent not specified
            (
                (CurrentCCA IS NULL AND CurrentCDA IS NULL) 
                AND 
                (CurrentCCB IS NULL AND CurrentCDB IS NULL)
            )
            -- Same parent
            OR
            (
                (CurrentCCA = CurrentCDA)
                AND
                (CurrentCCB = CurrentCDB)
            )
        )
    )

Which would be better?

EDIT: I want the T table to be another entity with optional references to CC and/or CD, not a table to optimize (see the –more fields comment). Think about CC and CD being two kind of parts of a machine P that cannot exist without P (for example CC defines the positions inside the machine, and CD defines other kind of zones inside the machine), and T being a final product positioned in a position and/or zone.

Best Answer

Both solutions for T or T-alternate are considered denormalizations. Denormalizations optimize a datamodel that is normally in third normal form into a structure that is more convenient for select queries. The correct data already exists in CC and CD. T or T-alternate are duplicating the data. With duplicate (denormalized) data you have to make sure that data cannot get out of synch with its parent.

I think the first option is the better one. The constraints are simple and straight-forward. There is no check constraint on multiple fields with nulls. You need to make sure that any insert into CC or CD also inserts into T. After-insert triggers on CC and CD would automatically insert A,B,C,null into T or A,B,null,D into T.

One thing to plan for is that a parent has two children, each with many rows per parent. Lets assume a row in P has 3 rows in CC, 0 in CD. If you write a query over P inner joining to CC and CD, it shows zero rows. An left outer join (parent on left) shows 3 rows.

Lets assume CC has 3 rows and CD has 4 rows. Inner join and outer join queries would show 3*4 = 12 rows.

Lets assume a 3rd set of data. CC is the main child table with 20 rows per P. CD is a descriptive text field with 3 rows per P. You could reduce the resulting row to 20 per P with an aggregation function like Oracle 11g's LISTAGG over CD. This combines the D descriptive field into a comma separated list "Planes, Trains, Automobiles".

Having CCA and CDA fields in T-alternative increases the chances for data to get out of synch, since CCA might not equal CDA when something goes wrong in the code. The T-alternative is not the standard way of approaching this problem. The first option does have some challenges, but it is simpler, more standard, and more likely to hold up under production systems.