Sql-server – Many:Many with Shared Relation

Architecturedatabase-designforeign keyschemasql server

I'm modelling data with multiplicity like this:

enter image description here

Each Composition/Anthology related pair must share a Composer. Also, each Anthology must contain at least one Composition. How would you recommend I model this?

Here is one possible representation with nearly enforced consistency (it doesn't enforce the 1+ Composition : 0+ Anthology multiplicity). However it duplicates FK_Composer lots of places (which as a side annoyance breaks some of my Entity Framework features).

Composer    Composition    junction          Anthology
--------    -----------    --------------    ---------
                           FK_Anthology   -> PK
PK       <- FK_Composer <- FK_Composer    -> FK_Composer
            PK          <- FK_Composition

Note: I'm also trying to hack this out at the business logic and ORM layers and have hit obstacles there as well.

Best Answer

The question I would ask is whether the direct relationship between Anthology and Composer is "important" to the system? There are all kinds of incidental relationships between tangible things that are recorded in any system. However, only certain of these are important for the purposes of the system itself. These are the ones that belong in a relational schema.

If anthologies will always be made up of compositions, and compositions will always have composers, then you can always derive the relationship between anthologies and composers using a query. If you did it that way there would be no risk of inconsistent relationships.

This model would look like this:

ERD

You would have table definitions something like this:

create table Composer
( composer_id int IDENTITY
, composer_name nvarchar(50)
, biography nvarchar(1000) null
, other_composer_info nvarchar(1000) null
, constraint pk_composer primary key(composer_id)
);

create table Composition
( composition_id int IDENTITY
, composer_id int
, composition_name nvarchar(50)
, composed_on datetime
, constraint pk_composition primary key(composition_id)
, constraint fk_composition_composer 
    foreign key (composer_id) references Composer(composer_id)
);

create table Anthology
( anthology_id int IDENTITY
, anthology_name nvarchar(50)
, constraint pk_anthology primary key (anthology_id)
);

create table AnthologyItem
( anthology_id int
, composition_id int
, constraint pk_anthology_item primary key 
   (anthology_id, composition_id)
, constraint fk_item_anthology (anthology_id)
   references Anthology(anthology_id)
, constraint fk_item_composition (composition_id)
   references Composition(composition_id)
);

The good thing about this is that it has no redundant data to get out of synch. The problem is that it doesn't quite fit Shannon's requirement that an anthology be about one composer and that all of the compositions in that anthology must be from the same composer.

Unfortunately, this is not an easy problem to solve with declarative referential constraints. Declarative constraints are great for making sure that everything within a row makes sense. What they aren't built to do is enforce rules between rows.

There is a declarative way to solve this problem, but it involves a trade-off that many people wouldn't like, because it smells an awful lot like violating normalization. Some people would argue (Mike Sherril comes to mind) that this solution doesn't literally violate normalization rules, but people who are less well attuned to the actual rules of normalization will probably look at this solution with skepticism.

So what is this controversial solution? It looks like this:

ERD2

Note that the primary keys of some of these tables have been modified. Here is the SQL DDL for the solution: (You need to scroll it to the bottom to see the magic.)

create table Composer
( composer_id int IDENTITY
, composer_name nvarchar(50)
, biography nvarchar(1000) null
, other_composer_info nvarchar(1000) null
, constraint pk_composer primary key(composer_id)
);

create table Composition
( composition_id int IDENTITY
, composer_id int
, composition_name nvarchar(50)
, composed_on datetime
, constraint pk_composition 
    primary key(composer_id, composition_id) -- NOTE CHANGE!
, constraint fk_composition_composer 
    foreign key (composer_id) references Composer(composer_id)
);

create table Anthology
( anthology_id int IDENTITY
, composer_id int                            -- THIS IS NEW!
, anthology_name nvarchar(50)
, constraint pk_anthology 
    primary key (composer_id, anthology_id)  -- THIS IS DIFFERENT
);

create table AnthologyItem
( composer_id int                            -- THIS IS NEW!
, anthology_id int
, composition_id int
, constraint pk_anthology_item primary key   -- THIS HAS CHANGED.
   (composer_id, anthology_id, composition_id)
, constraint fk_item_anthology 
    foreign key (composer_id, anthology_id)
    references Anthology(composer_id, anthology_id)
, constraint fk_item_composition
    foreign key (composer_id, composition_id)
    references Composition(composer_id, composition_id)
);

Note that the way this works is you have to impose a single composer on an anthology by making the composer part of the anthology's primary key. You do the same thing with composition. Then, when you create an intersection between composition and anthology, you have the composer ID twice. You can then use a check constraint to declaratively enforce that compositions and anthologies have not only a single composer, but the same one.

NOTE: I'm not saying you should do this, I'm just saying you could do this. YMMV etc.