Design Tagging module in the database

best practicesdatabase-designindexschema

I've a database system which we want to define tags for many tables into it. For example I've videos, photos, … different entities that we wish to make them taggable. The tags could be anything. By tagging I mean that we wish to allow the user from the front end to define tags for these entities (keywords), something like Facebook tagging. We would like to make it extensible for any future entity that may appear later in the system and we want it to be taggable too. Initially we thought that this could be a many to many relation ship between each entity that we want it to be taggable and each tagged entity in the database. For example, if we want to tag users in photos, then many users can be tagged in a single photo and many photos may have the same user tagged into them. But this design complicate things because we will need many to many relationship between all the taggable entities and the objects that we want to add tags to. So we will have a many to many relationship between user and photos, user and video, …. and this is only for the user entity, if we have other entity we will do the same, so it's complex and non extensible. So we tried to enhance that and we came with the following idea.

We will have a table with no relationships at all, it's not connected to any other table, this table will contain the following, the tag id, tag value, the entity object id (ex: user id), the entity object type (ex: user), the taggable object id (ex: photo id), the taggable object type (ex: photo). and we will add indexes for entity object id column and entity object type, also for taggable object id and taggable object type to fasten the search process.

But I was wondering if there any pattern for that, some design pattern that solves this issue, and which of the above 2 solutions is better.

Best Answer

What you describe in your second paragraph sounds like the Entity Attribute Value pattern.

This is a known pattern for dealing with the type of data you are talking about. When you need to be able to be extensible, drive the types from the data and allow you to start tracking new types as you grow. It allows you stay flexible and agile and be a little more forgiving and generic in your database design and schema.

This pattern works, even in relational databases, but it does take more work to get good performance, and can cause you headaches down the line depending on how many rows, how many inserts you are doing and how frequently you query it. This SO Question talks about some of the pitfalls nicely in the firs answer listed with 20 votes. I have seen this model work well, especially in environments where the new types and entities really do come in and we want to let that be more application driven or user driven but we were fighting with performance more often in that model than a traditional relational database model but we needed the flexibility and we made it work. Definitely look at that SO question and do some more research on EAV and whatever DBMS you are using to find examples from others who have tried the same.

Related Solutions

Database Relationships overview appreciated

I've no experience with salesforce, so this is a general answer from a database-design perspective.

I think you've made your model amazingly over complicated. As it stands, you'll have many circular relationships, the possibility to run into infinite loops when traversing relationships, and the possibility of everything linking to everything else (which kind of defeats the purpose of everything.)

If I've understood the above correctly, the following is a simplification of your entity relationships:

A program has many projects. (1:N)
Projects have many contacts, and a contact can be assigned for many projects. (M:N)
Agencies have many programs, and a program can belong to many agencies. (M:N)

The above are the "direct relationships."

Now with this simplification, you can relate Agencies to Contacts through programs and projects. There needs to be no direct relationship between the two. (For example, Contact John is assigned to Project X, which is part of Program Y, which is administered by Agency Z.) Same for contact to contact, and even agency to agency.

Database Design – different objects with shared tagging

The problem with your first example is the tri-link table. Is that going to require one of the foreign keys on either report or recommendations to always be NULL so that keywords link only one way or the other?

In the case of your second example, the joining from the base to the derived tables now may require use of the type selector or LEFT JOINs depending on how you do it.

Given that, why not just make it explicit and eliminate all the NULLs and LEFT JOINs?

Reports
----------
ReportID
ReportName


Recommendations
----------
RecommendationID
RecommendationName
ReportID (foreign key)


Keywords
----------
KeywordID
KeywordName


ReportKeywords
----------
KeywordID (foreign key)
ReportID (foreign key)

RecommendationKeywords
----------
KeywordID (foreign key)
RecommendationID (foreign key)

In this scenario when you add something else which needs to be tagged, you just add the entity table and the linkage table.

Then your search results look like this (see there is still type selection going on and turning them into generics at the object results level if you want a single results list):

SELECT CAST('REPORT' AS VARCHAR(15)) AS ResultType
    ,Reports.ReportID AS ObjectID
    ,Reports.ReportName AS ObjectName
FROM Keywords
INNER JOIN ReportKeywords
    ON ReportKeywords.KeywordID = Keywords.KeywordID
INNER JOIN Reports
    ON Reports.ReportID = ReportKeywords.ReportID
WHERE Keywords.KeywordName LIKE '%' + @SearchCriteria + '%'
UNION ALL
SELECT 'RECOMMENDATION' AS ResultType
    ,Recommendations.RecommendationID AS ObjectID
    ,Recommendations.RecommendationName AS ObjectName
FROM Keywords
INNER JOIN RecommendationKeywords
    ON RecommendationKeywords.KeywordID = Keywords.KeywordID
INNER JOIN Recommendations
    ON Recommendations.RecommendationID = RecommendationKeywords.ReportID
WHERE Keywords.KeywordName LIKE '%' + @SearchCriteria + '%'

No matter what, somewhere there is going to be type selection and some kind of branching going on.

If you look at how you would do this in your option 1, it's similar but with either a CASE statement or LEFT JOINs and a COALESCE. As you expand your option 2 with more things being linked, you have to keep adding more LEFT JOINs where things are typically NOT being found (an object that is linked can only have one derived table which is valid).

I don't think there is anything fundamentally wrong with your option 2, and you could actually make it look like this proposal with a use of views.

In your option 1, I have some difficulty seeing why you opted for the tri-link table.

Best Answer

Related Solutions

Database Relationships overview appreciated

Database Design – different objects with shared tagging

Related Question