Here is my SQLFiddle: http://sqlfiddle.com/#!2/affc4/9
A brief description of the issue:
I have the following data:
- topics
- brands
- celebrities
All 3 of those data types can be associated to an article
.
The associations exist in article_assets
.
As you can see from the fiddle, I easily retrieved all of the articles that are of asset_type
topic and topic name
of Relationships.
The issue I'm having is:
I cannot assign an article to having an association of "hot" as a topic AND also the celebrity "Denzel Washington".
I know the flaw is in my database structure. Here's my stream of thoughts on my database structure:
- The
article_assets
table contains an asset type and then a foreign key to that data's table primary id (asset_id
) - This creates a problem since I can only reference the specific asset type and the corresponding primary key id.
I need to figure out a solution to allowing multiple references to data and I don't really know how to modify my database structure from here.
Thanks for the help.
Best Answer
If you were designing from scratch, the structure you have is not ideal... however, it can easily be made to work as it is, with one small change to the structure and a more thorough understanding of what SQL can accomplish through joins.
You need to relax the unique constraint on `article_asset`.
This restricts you to not reusing the numeric asset_id of two different types of assets with the same article, which isn't correct, since asset_id is a value that can come from three different information domains (the primary keys of the three tables). The correct constraint would be this:
With this, you are only restricted from duplicating the asset_id with the same asset_type for the same article, so this is the correct constraint for that unique index. That's the only actual change you have to make to your schema.
Then, modify your query structure to effectively treat asset_article as if it were multiple different tables, "taa" (topic article assets) and "caa" (celebrity article assets), each containing a distinct subset of the rows in asset_article.
A logically-equivalent way of expressing this query (above) looks like this (below), which might make it clearer how we are joining both the "taa" and "caa" subsets of article_asset to the articles on the article_id ... or it may make everything much less clear. :)
A similar structure could be used to find articles about two different celebrities, two different topics, or 1 celeb + 2 topics + 1 brand or however many you felt like stringing together.
With your sample data, my example queries return nothing, because there is no overlap among the different types of assets associated with each story, but after changing the
UNIQUE
constraint, you can modify the data so that there are overlaps and the query will find the correct results.http://sqlfiddle.com/#!2/99e4d/4
On the other hand, if you do decide to go with a different design, I would suggest that the more correct approach involves fewer tables -- not more tables. If you step back and look at the three kinds of "things," you may see that they are all, in fact, one single kind of thing, or at least subtypes of a single kind of thing. There is no reason why celebrities, topics, and brands should be stored separately -- they are all "subjects" of the articles. When "things" in a database are the same "kind of thing," they should all be in one table.
For example, if you had a database of contact information, you wouldn't have an "office phone numbers" table and a "home phone numbers" table, and then a table matching people to phone numbers telling you which table to look in for the phone number... right? That makes no sense at all. This is the same scenario, just slightly less obvious because you didn't model it based on the common thread among these apparently different (but, arguably, actually the same) kinds of things.
I can't give you a textbook phrase to describe what I'm attempting to say, but if you ever need to look at an attribute in a row in order to determine what information domain another attribute in the row represents, you are definitely "doing it wrong."
One important reason for this assertion is that you cannot enforce referential integrity via foreign key constraints when the "foreign key" could reference the primary key of any one of several tables.
Consider this structure:
You could replace that
ENUM
with a subject_type_id and a corresponding subject_type table, and then, if you wish one day that you wish you had created an "places" table so you could find all of the stories about 'Family' and 'Rhode Island' there's no need to change your schema... you just add another subject_type record.In your current structure, you're also storing a redundant piece of information in every row of article_asset -- the type of the asset -- because you need that to find the table where you should look. Redundant data in a database is rarely a good thing.