Denormalization Strategies: When is it appropriate to create redundancy for faster querying

denormalizationnormalizationscalability

So I sometimes see databases purposely add redundant information for faster querying. I saw that this was mentioned here but I was wondering when would this be better than indexing, or creating a view? Or perhaps it is never better?

I know there are definitely scaling issues with adding this redundancy, but is there a time when it is more ideal to sacrifice scalability for immediate speed gains?

Best Answer

It seems that denormalization is done on a case-by-case basis. If you do decide to denormalize your tables for efficiency, make sure you are thoroughly testing your queries. As @Pieter Geerkens mention

Every redundancy made for for the purpose of one specific query will slow down all other queries, inserts, and updates against the same data.

That is denormalizing while may speed up some gains, there are very huge costs in not only speeds of other queries, but also in scalability. It's important to take these all into consideration along with other ways to improve performance.

In short, I'll end it with a quote from Chris Date who wrote the book Database in Depth: Relational Theory for Practitioners:

I believe firmly that anything less than a fully normalized design is strongly contraindicated ... [Y]ou should "denormalize" only as a last resort. That is, you should back off from a fully normalized design only if all other strategies for improving performance have somehow failed to meet requirements.

Extended Reading:

  1. Scaling Secret #2: Denormalizing Your Way To Speed And Profit
  2. Database War Stories #3: Flickr