Hoping someone can help me here as I can't quite wrap my head around the best way to do this.
Im trying to SUM() some numbers from a JSON block, which I have working just fine, how ever there could be multiple records per day per user, and I only want to SUM the numbers of the latest record per day per user.
So, essentially using the record that matches MAX(created_at) per user, per day.
Some sample data
id | user_id | scan_id | data | created_at
1 | 1 | 100 | {"score": 40} | 2015-11-06 22:15:27
2 | 1 | 101 | {"score": 50} | 2015-11-06 22:18:27
3 | 3 | 102 | {"score": 20} | 2015-11-06 22:15:27
4 | 3 | 103 | {"score": 70} | 2015-11-06 22:12:27
5 | 5 | 104 | {"score": 40} | 2015-11-06 22:15:27
6 | 6 | 105 | {"score": 10} | 2015-12-06 22:15:27
In the above data, I want to SUM the values from data->'score', but you can see the first 4 rows are from two users. I only want to use the LATEST record in the SUM, so that would be record id's 2 and 3, but not 1 and 4 (as they were older than the other records)
Record 6, would fall under its own day as its on a different date.
So, this query works without getting the latest record, I would like to know how to alter it to only use the latest record per user per day.
SELECT
SUM((DATA ->> 'score')::integer) AS score,
count(*) as count,
created_at::date
FROM
scores
GROUP BY created_at::date
Best Answer
One way (not necessarily the most efficient) is to use the Postgres special
DISTINCT ON
modifier in a subquery, and then sum:Tested at SQLfiddle.