MySQL – Storing vs Calculating Aggregate Values

aggregatedatabase-designMySQL

Are there any guidelines or rules of thumb to determine when to store aggregate values and when to calculate them on the fly?

For example, suppose I have widgets which users can rate (see schema below). Each time I display a widget I could calculate the average user rating from the Ratings table. Alternatively I could store the average rating on the Widget table. This would save me from having to calculate the rating every time I display the widget, but then I'd have to recalculate the average rating each time a user rated a widget.

Ratings       Widgets
---------     -------
widget_id     widget_id
user_id       name              
rating        avg_rating  <--- The column in question

Best Answer

It depends. Pre-calculating aggregate values places a larger load on writes, deriving them makes reads more difficult

If you are frequently accessing a derived value, pre-calculation is a valid de-normalization step. However, in this instance, I recommend using a Materialized View (a view, written to disk, linked by trigger to the parent tables). The materialized view is designed to store frequently asked but tedious-to-derive data, and is useful for high numbers of writes and low numbers of reads.

In a high-write, high-read scenario, consider having a task in the background which mimics the effects of a materialized view, but in less than real-time. This will present a "good enough" average while preserving write and read performance.

In no circumstances, should you treat the derived column like a "normal" column: make sure the data presented in the Widgets "view" is present elsewhere in the table, such that the entire tuple can be derived by whatever processes you emplace. This question is also strongly database (and database-version) specific, so I recommend performance testing of the aggregate (with appropriate indexes) against a normal-sized data set and the materialized view.