Mysql – Likes or votes for posts

database-designMySQLnormalizationstring

I am making a small program where users makes posts or write blogs. On those posts, other users can like or dislike the post as in facebook or upvote or downvote the post as in stackoverflow. I would like to know a good database structure which is commonly used & the program works efficiently with that structure. I have two options

First

Post:

id   head   message   datepost   likes   dislikes
1     ab    anchdg     DATE      1,2,3   7,55,44,3

In the above way, id is the postid. In the likes column, 1,2,3 is the user's id who liked or upvoted the post or blog. 7,55,44,3 is the id of the users who disliked or downvoted the post or blog.

Second

Post:

id    head  message   datepost
1     ab    anchdg     DATE

Likes:

id    postid    userid
1       1         1
2       2         2

Dislikes:

id    postid    userid
1       1         7
2       1         55

In this way, I have to create two separate tables for likes & dislikes to get post's likes. In this way, the tables i.e. Likes & Dislikes will get heavily filled. This might make table heavy & processing slow.

So, I would like to know which is the better & standard way to achieve this task?

Best Answer

The problem you face is known as "Normal forms" of databases, especially the first normal form. https://en.wikipedia.org/wiki/First_normal_form.

Your databse with the concatenated user IDs (first version) is not in first normal form.

See https://en.wikipedia.org/wiki/Database_normalization for why and how normalisation is generally considered good.

In your first example, the query for "user 4 does not like the post anymore" becomes complicated. It will have to do string operations, which will have to consider side effects and corner cases (user is the only "liking" user, user is the last liking user, user is in the middle of the liking user string). I would find this bad. Don't do it. Use a normalized design.

re: database gets heavy

If you have a post that has 4 million likes, in database design 1 you would have one row with a "likes" column that is at least 4 million characters wide (because you'll need the comma as seperator chars). You will then have to perform string operations on four million digit wide strings. This is very unperformant and slow.

On the other hand, databases are designed to handle millions of rows. We have databases with several hundred million rows, and count()-operations are fast. Extremely fast. So no, this will not be a performance bottleneck.

The next issue would be readability and maintainability.

For example, tell me what these 2 statements do:

select count(*)
from posts
inner join likes on posts.postid = likes.postid
where postid = 7

select len(likes) - len(replace(likes, ',', ''))
from posts
where postid = 7