Design Tagging module in the database

best practicesdatabase-designindexschema

I've a database system which we want to define tags for many tables into it. For example I've videos, photos, … different entities that we wish to make them taggable. The tags could be anything. By tagging I mean that we wish to allow the user from the front end to define tags for these entities (keywords), something like Facebook tagging. We would like to make it extensible for any future entity that may appear later in the system and we want it to be taggable too. Initially we thought that this could be a many to many relation ship between each entity that we want it to be taggable and each tagged entity in the database. For example, if we want to tag users in photos, then many users can be tagged in a single photo and many photos may have the same user tagged into them. But this design complicate things because we will need many to many relationship between all the taggable entities and the objects that we want to add tags to. So we will have a many to many relationship between user and photos, user and video, …. and this is only for the user entity, if we have other entity we will do the same, so it's complex and non extensible. So we tried to enhance that and we came with the following idea.

We will have a table with no relationships at all, it's not connected to any other table, this table will contain the following, the tag id, tag value, the entity object id (ex: user id), the entity object type (ex: user), the taggable object id (ex: photo id), the taggable object type (ex: photo). and we will add indexes for entity object id column and entity object type, also for taggable object id and taggable object type to fasten the search process.

But I was wondering if there any pattern for that, some design pattern that solves this issue, and which of the above 2 solutions is better.

Best Answer

What you describe in your second paragraph sounds like the Entity Attribute Value pattern.

This is a known pattern for dealing with the type of data you are talking about. When you need to be able to be extensible, drive the types from the data and allow you to start tracking new types as you grow. It allows you stay flexible and agile and be a little more forgiving and generic in your database design and schema.

This pattern works, even in relational databases, but it does take more work to get good performance, and can cause you headaches down the line depending on how many rows, how many inserts you are doing and how frequently you query it. This SO Question talks about some of the pitfalls nicely in the firs answer listed with 20 votes. I have seen this model work well, especially in environments where the new types and entities really do come in and we want to let that be more application driven or user driven but we were fighting with performance more often in that model than a traditional relational database model but we needed the flexibility and we made it work. Definitely look at that SO question and do some more research on EAV and whatever DBMS you are using to find examples from others who have tried the same.