Mysql – Multiple relationships issue

MySQLrelational-theory

Here is my SQLFiddle: http://sqlfiddle.com/#!2/affc4/9

A brief description of the issue:

I have the following data:

  • topics
  • brands
  • celebrities

All 3 of those data types can be associated to an article.

The associations exist in article_assets.

As you can see from the fiddle, I easily retrieved all of the articles that are of asset_type topic and topic name of Relationships.

The issue I'm having is:

I cannot assign an article to having an association of "hot" as a topic AND also the celebrity "Denzel Washington".

I know the flaw is in my database structure. Here's my stream of thoughts on my database structure:

  • The article_assets table contains an asset type and then a foreign key to that data's table primary id (asset_id)
  • This creates a problem since I can only reference the specific asset type and the corresponding primary key id.

I need to figure out a solution to allowing multiple references to data and I don't really know how to modify my database structure from here.

Thanks for the help.

Best Answer

If you were designing from scratch, the structure you have is not ideal... however, it can easily be made to work as it is, with one small change to the structure and a more thorough understanding of what SQL can accomplish through joins.

You need to relax the unique constraint on `article_asset`.

UNIQUE KEY `asset_article` (`asset_id`,`article_id`)

This restricts you to not reusing the numeric asset_id of two different types of assets with the same article, which isn't correct, since asset_id is a value that can come from three different information domains (the primary keys of the three tables). The correct constraint would be this:

UNIQUE KEY `asset_article` (`asset_type`,`asset_id`,`article_id`)

With this, you are only restricted from duplicating the asset_id with the same asset_type for the same article, so this is the correct constraint for that unique index. That's the only actual change you have to make to your schema.

Then, modify your query structure to effectively treat asset_article as if it were multiple different tables, "taa" (topic article assets) and "caa" (celebrity article assets), each containing a distinct subset of the rows in asset_article.

SELECT a.*
  FROM articles a
  JOIN article_assets taa ON taa.article_id = a.id AND taa.asset_type = 'topic' 
  JOIN topics t ON t.id = taa.asset_id 
  JOIN article_assets caa ON caa.article_id = a.id AND caa.asset_type = 'celebrity'  
  JOIN celebrities c ON c.id = caa.asset_id 
 WHERE t.name = 'Family'
   AND c.name = 'Ashton Kutcher';

A logically-equivalent way of expressing this query (above) looks like this (below), which might make it clearer how we are joining both the "taa" and "caa" subsets of article_asset to the articles on the article_id ... or it may make everything much less clear. :)

SELECT a.*
  FROM articles a
  JOIN (article_assets taa JOIN topics t 
                           ON t.id = taa.asset_id 
                           AND taa.asset_type = 'topic') ON taa.article_id = a.id
  JOIN (article_assets caa JOIN celebrities c 
                           ON c.id = caa.asset_id 
                           AND caa.asset_type = 'celebrity') ON caa.article_id = a.id
 WHERE t.name = 'Family'
   AND c.name = 'Ashton Kutcher';

A similar structure could be used to find articles about two different celebrities, two different topics, or 1 celeb + 2 topics + 1 brand or however many you felt like stringing together.

With your sample data, my example queries return nothing, because there is no overlap among the different types of assets associated with each story, but after changing the UNIQUE constraint, you can modify the data so that there are overlaps and the query will find the correct results.

http://sqlfiddle.com/#!2/99e4d/4


On the other hand, if you do decide to go with a different design, I would suggest that the more correct approach involves fewer tables -- not more tables. If you step back and look at the three kinds of "things," you may see that they are all, in fact, one single kind of thing, or at least subtypes of a single kind of thing. There is no reason why celebrities, topics, and brands should be stored separately -- they are all "subjects" of the articles. When "things" in a database are the same "kind of thing," they should all be in one table.

For example, if you had a database of contact information, you wouldn't have an "office phone numbers" table and a "home phone numbers" table, and then a table matching people to phone numbers telling you which table to look in for the phone number... right? That makes no sense at all. This is the same scenario, just slightly less obvious because you didn't model it based on the common thread among these apparently different (but, arguably, actually the same) kinds of things.

I can't give you a textbook phrase to describe what I'm attempting to say, but if you ever need to look at an attribute in a row in order to determine what information domain another attribute in the row represents, you are definitely "doing it wrong."

One important reason for this assertion is that you cannot enforce referential integrity via foreign key constraints when the "foreign key" could reference the primary key of any one of several tables.

Consider this structure:

CREATE TABLE subject (
  id INT NOT NULL AUTO_INCREMENT,
  subject_type ENUM('brand','celebrity','topic') NOT NULL,
  name VARCHAR(255) not null,
  PRIMARY KEY(id)
);

CREATE TABLE article_has_subject (
  article_id INT NOT NULL,
  subect_id INT NOT NULL,
  PRIMARY KEY(article_id, subject_id),
  KEY(subject_id, article_id) /* to optimize joins in either direction */
);

You could replace that ENUM with a subject_type_id and a corresponding subject_type table, and then, if you wish one day that you wish you had created an "places" table so you could find all of the stories about 'Family' and 'Rhode Island' there's no need to change your schema... you just add another subject_type record.

In your current structure, you're also storing a redundant piece of information in every row of article_asset -- the type of the asset -- because you need that to find the table where you should look. Redundant data in a database is rarely a good thing.