Sql-server – Many:Many with Shared Relation

Architecturedatabase-designforeign keyschemasql server

I'm modelling data with multiplicity like this:

enter image description here

Each Composition/Anthology related pair must share a Composer. Also, each Anthology must contain at least one Composition. How would you recommend I model this?

Here is one possible representation with nearly enforced consistency (it doesn't enforce the 1+ Composition : 0+ Anthology multiplicity). However it duplicates FK_Composer lots of places (which as a side annoyance breaks some of my Entity Framework features).

Composer    Composition    junction          Anthology
--------    -----------    --------------    ---------
                           FK_Anthology   -> PK
PK       <- FK_Composer <- FK_Composer    -> FK_Composer
            PK          <- FK_Composition

Note: I'm also trying to hack this out at the business logic and ORM layers and have hit obstacles there as well.

Best Answer

The question I would ask is whether the direct relationship between Anthology and Composer is "important" to the system? There are all kinds of incidental relationships between tangible things that are recorded in any system. However, only certain of these are important for the purposes of the system itself. These are the ones that belong in a relational schema.

If anthologies will always be made up of compositions, and compositions will always have composers, then you can always derive the relationship between anthologies and composers using a query. If you did it that way there would be no risk of inconsistent relationships.

This model would look like this:

ERD

You would have table definitions something like this:

create table Composer
( composer_id int IDENTITY
, composer_name nvarchar(50)
, biography nvarchar(1000) null
, other_composer_info nvarchar(1000) null
, constraint pk_composer primary key(composer_id)
);

create table Composition
( composition_id int IDENTITY
, composer_id int
, composition_name nvarchar(50)
, composed_on datetime
, constraint pk_composition primary key(composition_id)
, constraint fk_composition_composer 
    foreign key (composer_id) references Composer(composer_id)
);

create table Anthology
( anthology_id int IDENTITY
, anthology_name nvarchar(50)
, constraint pk_anthology primary key (anthology_id)
);

create table AnthologyItem
( anthology_id int
, composition_id int
, constraint pk_anthology_item primary key 
   (anthology_id, composition_id)
, constraint fk_item_anthology (anthology_id)
   references Anthology(anthology_id)
, constraint fk_item_composition (composition_id)
   references Composition(composition_id)
);

The good thing about this is that it has no redundant data to get out of synch. The problem is that it doesn't quite fit Shannon's requirement that an anthology be about one composer and that all of the compositions in that anthology must be from the same composer.

Unfortunately, this is not an easy problem to solve with declarative referential constraints. Declarative constraints are great for making sure that everything within a row makes sense. What they aren't built to do is enforce rules between rows.

There is a declarative way to solve this problem, but it involves a trade-off that many people wouldn't like, because it smells an awful lot like violating normalization. Some people would argue (Mike Sherril comes to mind) that this solution doesn't literally violate normalization rules, but people who are less well attuned to the actual rules of normalization will probably look at this solution with skepticism.

So what is this controversial solution? It looks like this:

ERD2

Note that the primary keys of some of these tables have been modified. Here is the SQL DDL for the solution: (You need to scroll it to the bottom to see the magic.)

create table Composer
( composer_id int IDENTITY
, composer_name nvarchar(50)
, biography nvarchar(1000) null
, other_composer_info nvarchar(1000) null
, constraint pk_composer primary key(composer_id)
);

create table Composition
( composition_id int IDENTITY
, composer_id int
, composition_name nvarchar(50)
, composed_on datetime
, constraint pk_composition 
    primary key(composer_id, composition_id) -- NOTE CHANGE!
, constraint fk_composition_composer 
    foreign key (composer_id) references Composer(composer_id)
);

create table Anthology
( anthology_id int IDENTITY
, composer_id int                            -- THIS IS NEW!
, anthology_name nvarchar(50)
, constraint pk_anthology 
    primary key (composer_id, anthology_id)  -- THIS IS DIFFERENT
);

create table AnthologyItem
( composer_id int                            -- THIS IS NEW!
, anthology_id int
, composition_id int
, constraint pk_anthology_item primary key   -- THIS HAS CHANGED.
   (composer_id, anthology_id, composition_id)
, constraint fk_item_anthology 
    foreign key (composer_id, anthology_id)
    references Anthology(composer_id, anthology_id)
, constraint fk_item_composition
    foreign key (composer_id, composition_id)
    references Composition(composer_id, composition_id)
);

Note that the way this works is you have to impose a single composer on an anthology by making the composer part of the anthology's primary key. You do the same thing with composition. Then, when you create an intersection between composition and anthology, you have the composer ID twice. You can then use a check constraint to declaratively enforce that compositions and anthologies have not only a single composer, but the same one.

NOTE: I'm not saying you should do this, I'm just saying you could do this. YMMV etc.

Related Solutions

How shared nothing and shared everything architectures look like in Oracle

Shared nothing typically refers to hardware, though it could also be used when describing an SOA architecture.

From a hardware perspective, a "shared nothing" Oracle database simply means a machine using local disks, memory, etc.

Oracle RAC (real application clusters) is a shared disk architecture, but not shared memory. Horizontal scaling is then achieved by adding more machines to the cluster.

When talking in SOA terms, "shared nothing" means that each service has a corresponding database which is only accessed by that service. So the ACCOUNTS service accesses the ACCOUNTS_DB, ORDERS service the ORDERS_DB and so on. These databases could be shared nothing from a hardware perspective as well, or use RAC.

Ensuring consistency of data and references which would normally be handled using foreign keys becomes a challenge in SOA shared nothing databases.

Sharding typically refers to partitioning managed at the application level, rather than within the database. For example, you could partition accounts by email address and direct customers with address starting A-C to ACCOUNTS_DB01, D-F ACCOUNT_DB02 and so on. The shard mapping could be a simple range like this, a function on the input or a lookup database stating which database is stored in. The databases would be "hardware shared nothing" in this case as the idea is you use relatively cheap machines which are easily added and replaced.

You could shard your databases at the application level and still have Oracle partitioning at the table level within the database itself. So you could shard your ORDERS database by customer, then partition the orders table by order date as well inside the database.

The downside to both meanings of shared nothing comes if you frequently run queries that have to access several databases. In these cases your joins will be pushed into the application layer rather than the DB layer so are likely to be slower. Good governance is necessary to ensure this doesn't happen.

Database Design – different objects with shared tagging

The problem with your first example is the tri-link table. Is that going to require one of the foreign keys on either report or recommendations to always be NULL so that keywords link only one way or the other?

In the case of your second example, the joining from the base to the derived tables now may require use of the type selector or LEFT JOINs depending on how you do it.

Given that, why not just make it explicit and eliminate all the NULLs and LEFT JOINs?

Reports
----------
ReportID
ReportName


Recommendations
----------
RecommendationID
RecommendationName
ReportID (foreign key)


Keywords
----------
KeywordID
KeywordName


ReportKeywords
----------
KeywordID (foreign key)
ReportID (foreign key)

RecommendationKeywords
----------
KeywordID (foreign key)
RecommendationID (foreign key)

In this scenario when you add something else which needs to be tagged, you just add the entity table and the linkage table.

Then your search results look like this (see there is still type selection going on and turning them into generics at the object results level if you want a single results list):

SELECT CAST('REPORT' AS VARCHAR(15)) AS ResultType
    ,Reports.ReportID AS ObjectID
    ,Reports.ReportName AS ObjectName
FROM Keywords
INNER JOIN ReportKeywords
    ON ReportKeywords.KeywordID = Keywords.KeywordID
INNER JOIN Reports
    ON Reports.ReportID = ReportKeywords.ReportID
WHERE Keywords.KeywordName LIKE '%' + @SearchCriteria + '%'
UNION ALL
SELECT 'RECOMMENDATION' AS ResultType
    ,Recommendations.RecommendationID AS ObjectID
    ,Recommendations.RecommendationName AS ObjectName
FROM Keywords
INNER JOIN RecommendationKeywords
    ON RecommendationKeywords.KeywordID = Keywords.KeywordID
INNER JOIN Recommendations
    ON Recommendations.RecommendationID = RecommendationKeywords.ReportID
WHERE Keywords.KeywordName LIKE '%' + @SearchCriteria + '%'

No matter what, somewhere there is going to be type selection and some kind of branching going on.

If you look at how you would do this in your option 1, it's similar but with either a CASE statement or LEFT JOINs and a COALESCE. As you expand your option 2 with more things being linked, you have to keep adding more LEFT JOINs where things are typically NOT being found (an object that is linked can only have one derived table which is valid).

I don't think there is anything fundamentally wrong with your option 2, and you could actually make it look like this proposal with a use of views.

In your option 1, I have some difficulty seeing why you opted for the tri-link table.

Best Answer

Related Solutions

How shared nothing and shared everything architectures look like in Oracle

Database Design – different objects with shared tagging

Related Question