Optimisation Newbie: How much of a ‘sin’ is redundancy

database-designnormalizationoptimization

I've been developing for a while, but never really had to deal with DB/scaling issues before. That's suddenly changed and I've found myself in the deep end.

I have 2 SQL tables, as such:

VOTES
vote_id (PK)
question_id (FK)
user_id (FK)
option_id (FK) <The option the user voted for>

The 'questions' table looks like this:

QUESTIONS
question_id (PK)
option_1 (FK --> Options)
option_2 (FK --> Options)

The problem I have is that I very often have to retrieve the sum of all option_1 votes (or option_2 votes) for a question. This is currently done by selecting count where question_id = [@question.id] and option_id = [@question.option_1.id]"

I'm guessing it'd be a heckuva lot quicker to just add option_1_votes and option_2_votes columns to the "questions" table and increment them each time a vote is added. But ultimately, that's redundant data.

So, as someone who's pretty clueless re: database design princpiples, what's the rule of thumb here? Would a top DBA just add in the columns, or try some other solution?

Cheers…

Best Answer

Generally, I don't add redundant columns unless I really need too.

Running a COUNT over a set of data is quite efficient in any RDBMS.

Consider this is a read over indexed (hopefully) cached data to get the count will beat the the 2nd write in to maintain the denormlaised column. This write requires more resources/locking/longer transaction etc which impact reads more

If performance becomes an issue over time, then you can pre-calculate the COUNT more efficiently using an indexed (aka materialised) view