Database Design Performance – Summary Table vs Counting Over Tables

database-designperformance

Imagine a system where a user can make a whole lot of different actions, like:

  • Making a purchase
  • Liking something
  • Opening a specific view (pretty much every click)
  • Inviting a friend

The actions are stored in tables, example: purchase_log, invite_log etc. With "log", I mean the action history of our users. Not system logs.

The actions need to be acted upon, examples:

  • Display user statistics (when opening profile view for example)
  • Counting actions toward giving achievements (by a separate service, performed once a minute or so for all users)

My questions:

  1. Is it bad practice to have a separate table like user_statistics where a simple count is incremented, instead of counting/searching over the xxx_log-tables? This obviously breaks the normalisation rules.
  2. Does the answer change depending on system load / size of application?
  3. Does the answer change when considering transaction locks / concurrency issues?
  4. Does the answer change with SQL vs NoSQL?
  5. Does the answer change with sync vs async systems?

Best Answer

I think the answer will vary depending upon what you want to achieve.

As you have mentioned that these things need to be acted on analysed, reported & displayed then I would consider that having the data in tables would be advantageous.

The decision to calculate summary data vs calc on the fly depends upoon size volume and scalability and accuracy required. A mixture of real time and batch processing might be viable and practical approach.

I would expect performance would be an important consideration. Do end users need to see their stats updated immediately? Or are you collecting this data to drive marketing analysis which can happen periodically at a later date.

Go back to your requirements. What data do you need. Who needs it. When do they need it. How fast & how often. Do they need raw data or aggregated. Detail or Summary? How much data are you expecting to collect. How long do you need to retain it? What are the storage requirements and how will that perform?

If you can answer the questions above you may narrow down your choices and your remaining options will be clearer.