Mysql – best way to store tags

database-designMySQLweb server

I'm a university student building a web application for a project with three collegues , I took the database portion.

Our site is basically a link sharing site and supports tags, what is an efficient way to store tags in Mysql database ? I thought of these ways :

1- table for tags : the relation between link and tag is many-to-many so I must break it with an intermediate table, I can't think of an SQL statements to fetch links with specific tags example : the user might search for links tagged with (php , mysql ) or (hardware-issues , gigabyte).

Also even if I did , it needs join , we're making the site global, that means the database will be very very huge some day , I don't think we can bare a join.

2-putting tags as an array in links table : I can put an array of strings, this means only one select statement(better than join I think) but manipulating the array will be hard, the link's poster might edit the tags on his/her link, I can figure out a way to deal with this but I'm not sure if this is the most efficient way , of course it's not normalized and the database will grow faster but I think on a server , speed is more important than size.

So is there a better way ?

Best Answer

Why do you think that you will not be able to bare a join? In fact if I understand correctly, your other idea is to put a list of tags into a varchar column (am I correct?) and search links by searching text in that column and that will definitely be much slower since you will have to full-scan the links table for every search.

Joins are normal, every-day operations in relational databases and are not something that you should avoid.

So say you have 3 tables,

links(link_id, url)
tags(tag_id, tag)
tags_links(tag_id, link_id)

You want to find all links with tags in (php , mysql)?

select l.link_id, l.url 
from links l 
    inner join link_tags lt on (l.link_id = lt.link_id)
    inner join tags t on (lt.tag_id = t.tag_id)
where t.tag in ('php', 'mysql')

links.link_id and tags.tag_id must be primary keys. In tag_links you can put primary key on combination (tag_id, link_id) - the supporting index will make searches by tag_id fast. You can an should also index the link_id column in that table, so that you can find all tags for a given link fast. Also, put an unique index on tags.tag column so that you can find tag_id from tag fast (and tags should be unique anyway, so you're enforcing an important business rule with such a constraint and index).

Try that out, fill some imaginary test data into such a schema, say, 1M links, 10k tags and 10M tags_links, and test the performance. I think you'll be surprised with the results.