Postgresql – How to implement posts with “seen by” like facebook

database-designperformancepostgresql

Inside a Facebook group there is a feature when every time see the group newsfeed it mark all posts as seen by me.

And if I see a specific post inside a group thought a url (facebook.com/groups/123/posts/123), it mark as seen this single post.

enter image description here

My question is about what is the intuitive way and right to model this and performance. I want to know what I am missing or wrong about about to get it right.

What happens on facebook: "When a new member join the group, all the old posts are marked as seen by him". But is it per post or global?

Scenarios:

  1. Mark as seen all individual posts when click group.

Pros:

  • Easy development
  • Easy and fast queries. Could implement a int counter column on each post and a has_many table users_id x post_id
  • Getting and individual post I can tell how many users saw without context

Cons:

  • How to deal with a new member join a group with > 100 posts? Batch writes?

2.Mark as seen the group when click group.

Pros:

  • One single write when seen a group

Any tips and advices are welcome!
Thanks!

Best Answer

I think the main thing to consider here is how you store who have seen what. The most convenient way is to store the post-user pairs. Any solution with less granularity will have some cases hard to handle - your second suggestion, if I understood it correctly, would store a group-user pair, plus a flag that all posts are already seen (I imagine that this would be rather the last post ID at the moment of joining the group). This would only work if you can't miss posts, so that the unseen ones form a contiguous group, all having a bigger ID than all already-seen ones.

Since the user-post pairing consists exactly these two fields, which can be, expecting extremely high user and post numbers, 2×8 bytes per row, I wouldn't expect performance problems with 10,000 users and 100,000 posts (provided there are really groups, I mean, not all users belong to all groups and be able to read all posts - in this case, the table would be approximately 30 GB in size (plus the same size for the index necessary for the primary key). This size can only be handled by appropriate hardware and some magic, like archiving old posts as those are queried rarely and so on. If your system exceeds that size, you probably make nice money of it, being able to beef your system up :)