Postgresql – Slow performance on averaging query

optimizationperformancepostgresqlquery-performance

I am fairly new to DBMS and backend management, and was hoping to get some advice on the following issue.

We have a Postgres database with the following tables: members and activity_scores. The activity_scores belong to each member, and are written to every 10 minutes. Here is an example of one of the rows of the activity_scores table (indexed on member_id):

 id | member_id | score |         created_at         |         updated_at         
----+-----------+-------+----------------------------+----------------------------
  1 |         8 |    73 | 2016-11-04 05:32:55.564235 | 2016-11-04 05:32:55.564235
  2 |        10 |    20 | 2016-11-04 05:22:55.564235 | 2016-11-04 05:22:55.564235

Size of activity_scores data per member could be 10,000's to millions.

The query we are trying to run essentially takes scores for a member, and runs the average grouped by the date:

SELECT TO_CHAR((activity_scores.created_at AT TIME ZONE ? AT TIME ZONE members.zone),
         ?) AS date,
         AVG(score) AS average_score
FROM "activity_scores"
INNER JOIN "members"
    ON "members"."id" = "activity_scores"."member_id"
WHERE "activity_scores"."member_id" = $1
GROUP BY  date

An example result:

    date    |     average_score      
------------+------------------------
 2016/10/15 | 52.00000000000000000000
 2016/10/29 | 60.25000000000000000000
 2016/09/05 | 70.05000000000000000000

These values are then used for graphing.

Unfortunately, the above query takes a long time (sometimes up to a few minutes), and causes the entire server (hosted on Heroku) to timeout.

There isn't really a need for us to have the data saved in 10 minute increments for months. Therefore, I've thought about making a separate table to just store the daily averages which we can poll directly from for graphing purposes. However this is somewhat of a duplicated data (which allegedly is a no-no in the DBM sense).

Would you guys have any suggestions on how to handle something like this?

Thank you!

Best Answer

If you are only inserting into the table and old rows are not deleted or updated and you don't insert rows for old dates, I don't see any harm at all from materializing the query results in another table. Every day, you could be aggregating the previous day's results for each member. Index properly the new table and efficiency should be good.

These are not duplicated data, they are summary results. Since efficiency of calculating them on the spot is not good enough, you are justified to pre-calculate and store them in a summary table.

Only if the members.zone changes, you would have to recalculate the results for that member.