Mysql – Need advice for a ‘tag’ table design

database-designinnodbmaintenanceMySQLperformance

My website has different content types. All content types should be tag-able (except if specified in the model, whatever). I can think of 2 options for my design:

2 tables option : 1 table for the tags (+one other, its i18n content) plus 1 table for the association (content_type , content_id, tag_id).

Pros :

  • Creation is faster
  • model parts for the tag is the same (so inherited model)
  • same thing for controlers.

Potential Cons :

  • Will the performance be poor?
  • will the table be too large soon ?
  • How to query for tags clouds (related to all grouped contents)?

many tables options :
1 table for tags, an association table per type of content.

Pros :

  • Tables will be shorter

Cons :

  • Have to customize each model or write a logic/footprint to query the right table
  • ugly long request / php code to join tags of all content types

Additional Facts :

  • Front side will have cache, tags are set by content writers.
  • There are 10 content types.
  • There won't often be new types of content.
  • Content count is ~500 , will ~double in 3 years.
  • MySQL engine InnoDB.
  • built with yii framework.
  • will query tags for a defined content type
  • will query tags for any type of content (together, one big tag cloud for example).

So my idea was to use the first design (2 tables). But I'm not a db expert …
Would you recommand the second option ? Why ?

Best Answer

I would go for option 1 (two tables).

If you only plan to have 1000 content items (and in 3 years, not now), I don't think you will have many performance problems unless each content item is tagged with a dozen tags where most are unique to that item. And even then I don't think you'll see much of a performance problems. It also simplifies your structure if you have 1 tag table for every content type and then you decide to add/remove/change content types later.

If you're really not sure, do you have the time/resources to build two sample DBs, one based on each structure, populate them with what you think is a realistic load (you can write a script/program to generate & insert test data, right? with "boring" tags and content such as "content_1..content_2000 and tag_1... tag_5000) and actually test which performs better?