I am making a small program where users makes posts or write blogs. On those posts, other users can like or dislike the post as in facebook or upvote or downvote the post as in stackoverflow. I would like to know a good database structure which is commonly used & the program works efficiently with that structure. I have two options
First
Post:
id head message datepost likes dislikes
1 ab anchdg DATE 1,2,3 7,55,44,3
In the above way, id
is the postid. In the likes column, 1,2,3
is the user's id who liked or upvoted the post or blog. 7,55,44,3
is the id of the users who disliked or downvoted the post or blog.
Second
Post:
id head message datepost
1 ab anchdg DATE
Likes:
id postid userid
1 1 1
2 2 2
Dislikes:
id postid userid
1 1 7
2 1 55
In this way, I have to create two separate tables for likes & dislikes to get post's likes. In this way, the tables i.e. Likes
& Dislikes
will get heavily filled. This might make table heavy & processing slow.
So, I would like to know which is the better & standard way to achieve this task?
Best Answer
The problem you face is known as "Normal forms" of databases, especially the first normal form. https://en.wikipedia.org/wiki/First_normal_form.
Your databse with the concatenated user IDs (first version) is not in first normal form.
See https://en.wikipedia.org/wiki/Database_normalization for why and how normalisation is generally considered good.
In your first example, the query for "user 4 does not like the post anymore" becomes complicated. It will have to do string operations, which will have to consider side effects and corner cases (user is the only "liking" user, user is the last liking user, user is in the middle of the liking user string). I would find this bad. Don't do it. Use a normalized design.
re: database gets heavy
If you have a post that has 4 million likes, in database design 1 you would have one row with a "likes" column that is at least 4 million characters wide (because you'll need the comma as seperator chars). You will then have to perform string operations on four million digit wide strings. This is very unperformant and slow.
On the other hand, databases are designed to handle millions of rows. We have databases with several hundred million rows, and count()-operations are fast. Extremely fast. So no, this will not be a performance bottleneck.
The next issue would be readability and maintainability.
For example, tell me what these 2 statements do: