Which is the better schema design for a voting system

database-designschema

Imagine a q/a site or forum similar to stackoverflow. I'm trying to determine which of these two methods would be the better design for a voting system:
(note: I would also like to keep record of all a users vote decisions; and you can assume the site experiences a relatively high amount of traffic)

  1. create a junction table named VOTE referencing USERS, and POSTS. The problem I see with this is that when a view displays a topic listing all the posts, this seems very inefficient when we want to display each post's "score", since we'd have to count N votes for M number of posts; if the forum is very large this will be a major hindrance.

  2. create the above junction table, but also keep a score on the posts table and simply use this to alleviate the above performance issue. The problem with this is that it describes something that is already described in another table which isn't a good idea either, if im correct?

  3. Use some nosql solution for this part of the application; however this introduces another moving part into the equation and complicates things by having two db technologies working with eachother.

(my solution doesn't have to be one of these three, if you have a fourth or even fifth it is more than welcome, thank you)

(oh, and if it matters in any way, ill be using postgres)

Best Answer

Approach 1 is the cleanest with no redundancy. However I can understand your worries about scanning through a huge table just to count a post votes.

So approach 2 which features some "caching" seams fair. The downside is that you add some redundancy but this is not even the main issue. You also lose consistency. It's possible to have a post with 500 votes while the votes column in the post table only says 300. This is a potential risk.

So it's the choice between a clean and elegant approach and increased performance that though comes along with redundancy and loss of consistency.

In my opinion it'd be OK to go with the 2nd option though it's somewhat crappy. I think though that the performance difference will not be that huge as one could expect. For example every time you're displaying a Question and it's answers you are doing a SELECTION on ALL Answers ever given to all questions. Best way would be to put the 1st and 2nd option to a test on a massive amount of test data but for the time being I'd go with the first option. It is the best and cleanest and performance optimization can be done better elsewhere.

Note you save yourself the time to update 2 places with every vote.