Joining on a M-N lookup table vs keyword searching a single column

performancequery-performanceschema

I am in the process of writing an image gallery webapp similar to that of Danbooru, and in the process, decided to take a look at danbooru's database schema for an idea about how things are structured. In doing so, I've noticed that the 'uploads' table contains a column named tag_string, which I'm guessing contains comma-separated tags. There's also a tags table which (again, by the looks of it) contains metadata about individual tags.

This got me thinking about the practicality of my initial design whereby I have a M-N lookup table specifically for mapping N individual tags to M different files. My design would use SQL Join to get a result set containing all of the images I'm looking for, while I can forsee danbooru's solution using a search on the text column containing the list of tags associated with that image.

My lookup table could concievably get very large (in terms of # of rows) if every file contains upwards of 20 or more tags, whereas the other solution only extends the size of the files table by one column.

With these differences in mind, what are the pros and cons of the different implementations?

For reference, here is an incomplete (but contains enough information regarding this question) design of my schema.

enter image description here

Best Answer

It is best to store your tags in a separate table because of the many to many relationship. With something like tagging images you can end up with a very large number of comma separated values that would need to be parsed any time you search your database which will lead to massive performance problems as your data grows.

Going that route and searching for a tag or partial tag will require you to parse through multiple strings multiple times. But if all you are doing is displaying the data to be read then comma separation is a fantastic idea and it would be much faster than joining to another table.

Here is great explanation on why, in general, comma separation is not a good practice.

--EDIT

After looking through the githuh link you have posted and the web site, I think Danbooru is doing a combination of two things. If I had to quickly guess, they are keeping the comma separated values of tags to make it easy and fast to view the text upon a mouse hover. They also seem to be inserting the tags into a well structured set of tables. It may be nice to keep that in mind for your practices but I would recommend leaving that out if you can.