My background is more in web programming rather than database administration, so please correct me if I'm using the wrong terminology here. I'm trying to figure out the best way to design the database for an application I'll be coding.
The situation: I've got Reports in one table and Recommendations in another table. Each Report can have many Recommendations. I also have a separate table for Keywords (to implement tagging). However, I want to have just one set of keywords that gets applied to both Reports and Recommendations so that searching on keywords gives you Reports and Recommendations as results.
Here's the structure I started out with:
Reports
----------
ReportID
ReportName
Recommendations
----------
RecommendationID
RecommendationName
ReportID (foreign key)
Keywords
----------
KeywordID
KeywordName
ObjectKeywords
----------
KeywordID (foreign key)
ReportID (foreign key)
RecommendationID (foreign key)
Instinctively, I feel like this isn't optimal and that I should have my taggable objects inherit from a common parent, and have that comment parent be tagged, which would give the following structure:
BaseObjects
----------
ObjectID (primary key)
ObjectType
Reports
----------
ObjectID_Report (foreign key)
ReportName
Recommendations
----------
ObjectID_Recommendation (foreign key)
RecommendationName
ObjectID_Report (foreign key)
Keywords
----------
KeywordID (primary key)
KeywordName
ObjectKeywords
----------
ObjectID (foreign key)
KeywordID (foreign key)
Should I go with this second structure? Am I missing any important concerns here? Also, if I do go with the second, what should I use as a non-generic name to replace "Object"?
Update:
I'm using SQL Server for this project. It's an internal application with a small number of non-concurrent users, so I don't anticipate a high load. In terms of usage, the keywords will likely be used sparingly. It's pretty much just for statistical reporting purposes. In that sense, whatever solution I go with will probably only affect any developers that will need to maintain this system down the line… but I figured it's good to implement good practices whenever I can. Thanks for all the insight!
Best Answer
The problem with your first example is the tri-link table. Is that going to require one of the foreign keys on either report or recommendations to always be NULL so that keywords link only one way or the other?
In the case of your second example, the joining from the base to the derived tables now may require use of the type selector or LEFT JOINs depending on how you do it.
Given that, why not just make it explicit and eliminate all the NULLs and LEFT JOINs?
In this scenario when you add something else which needs to be tagged, you just add the entity table and the linkage table.
Then your search results look like this (see there is still type selection going on and turning them into generics at the object results level if you want a single results list):
No matter what, somewhere there is going to be type selection and some kind of branching going on.
If you look at how you would do this in your option 1, it's similar but with either a CASE statement or LEFT JOINs and a COALESCE. As you expand your option 2 with more things being linked, you have to keep adding more LEFT JOINs where things are typically NOT being found (an object that is linked can only have one derived table which is valid).
I don't think there is anything fundamentally wrong with your option 2, and you could actually make it look like this proposal with a use of views.
In your option 1, I have some difficulty seeing why you opted for the tri-link table.