Database Design – different objects with shared tagging

best practicesdatabase-designschema

My background is more in web programming rather than database administration, so please correct me if I'm using the wrong terminology here. I'm trying to figure out the best way to design the database for an application I'll be coding.

The situation: I've got Reports in one table and Recommendations in another table. Each Report can have many Recommendations. I also have a separate table for Keywords (to implement tagging). However, I want to have just one set of keywords that gets applied to both Reports and Recommendations so that searching on keywords gives you Reports and Recommendations as results.

Here's the structure I started out with:

Reports
----------
ReportID
ReportName


Recommendations
----------
RecommendationID
RecommendationName
ReportID (foreign key)


Keywords
----------
KeywordID
KeywordName


ObjectKeywords
----------
KeywordID (foreign key)
ReportID (foreign key)
RecommendationID (foreign key)

Instinctively, I feel like this isn't optimal and that I should have my taggable objects inherit from a common parent, and have that comment parent be tagged, which would give the following structure:

BaseObjects
----------
ObjectID (primary key)
ObjectType


Reports
----------
ObjectID_Report (foreign key)
ReportName


Recommendations
----------
ObjectID_Recommendation (foreign key)
RecommendationName
ObjectID_Report (foreign key)


Keywords
----------
KeywordID (primary key)
KeywordName


ObjectKeywords
----------
ObjectID (foreign key)
KeywordID (foreign key)

Should I go with this second structure? Am I missing any important concerns here? Also, if I do go with the second, what should I use as a non-generic name to replace "Object"?

Update:

I'm using SQL Server for this project. It's an internal application with a small number of non-concurrent users, so I don't anticipate a high load. In terms of usage, the keywords will likely be used sparingly. It's pretty much just for statistical reporting purposes. In that sense, whatever solution I go with will probably only affect any developers that will need to maintain this system down the line… but I figured it's good to implement good practices whenever I can. Thanks for all the insight!

Best Answer

The problem with your first example is the tri-link table. Is that going to require one of the foreign keys on either report or recommendations to always be NULL so that keywords link only one way or the other?

In the case of your second example, the joining from the base to the derived tables now may require use of the type selector or LEFT JOINs depending on how you do it.

Given that, why not just make it explicit and eliminate all the NULLs and LEFT JOINs?

Reports
----------
ReportID
ReportName


Recommendations
----------
RecommendationID
RecommendationName
ReportID (foreign key)


Keywords
----------
KeywordID
KeywordName


ReportKeywords
----------
KeywordID (foreign key)
ReportID (foreign key)

RecommendationKeywords
----------
KeywordID (foreign key)
RecommendationID (foreign key)

In this scenario when you add something else which needs to be tagged, you just add the entity table and the linkage table.

Then your search results look like this (see there is still type selection going on and turning them into generics at the object results level if you want a single results list):

SELECT CAST('REPORT' AS VARCHAR(15)) AS ResultType
    ,Reports.ReportID AS ObjectID
    ,Reports.ReportName AS ObjectName
FROM Keywords
INNER JOIN ReportKeywords
    ON ReportKeywords.KeywordID = Keywords.KeywordID
INNER JOIN Reports
    ON Reports.ReportID = ReportKeywords.ReportID
WHERE Keywords.KeywordName LIKE '%' + @SearchCriteria + '%'
UNION ALL
SELECT 'RECOMMENDATION' AS ResultType
    ,Recommendations.RecommendationID AS ObjectID
    ,Recommendations.RecommendationName AS ObjectName
FROM Keywords
INNER JOIN RecommendationKeywords
    ON RecommendationKeywords.KeywordID = Keywords.KeywordID
INNER JOIN Recommendations
    ON Recommendations.RecommendationID = RecommendationKeywords.ReportID
WHERE Keywords.KeywordName LIKE '%' + @SearchCriteria + '%'

No matter what, somewhere there is going to be type selection and some kind of branching going on.

If you look at how you would do this in your option 1, it's similar but with either a CASE statement or LEFT JOINs and a COALESCE. As you expand your option 2 with more things being linked, you have to keep adding more LEFT JOINs where things are typically NOT being found (an object that is linked can only have one derived table which is valid).

I don't think there is anything fundamentally wrong with your option 2, and you could actually make it look like this proposal with a use of views.

In your option 1, I have some difficulty seeing why you opted for the tri-link table.