Postgresql – Sum duration for each day of a month

functionspostgresqltimestamp

Question: I would like to create a query that can return a list of all days of the month with a sum of all duration for each day including zero for each day that has no data.

Database: PostgreSQL 9.x

Table structure (only relevant columns shown):

-------------------------------------------------------------
| id | start_at            | finish_at           | duration |
-------------------------------------------------------------
| 1  | 2015-08-14 14:01:00 | 2015-08-15 13:59:00 | 86280    |
-------------------------------------------------------------

I have the following query for grabbing the data I want but the part that is missing is the days between with zero and I'm not entirely sure on the best approach to achieve the desired outcome.

SELECT date_trunc('day', start_at) AS "day" , sum(duration) AS "duration"
FROM time_logs
JOIN users ON time_logs.user_id = users.id
JOIN account_users ON account_users.user_id = users.id
JOIN categories ON categories.id = time_logs.activity_id
WHERE account_users.account_id IN (1,2,3)
AND time_logs.start_at BETWEEN '2017-03-01 00:00:00 +1000' AND '2017-03-31 00:00:00 +1000'
AND categories.report_group = 'Segment1'
GROUP BY 1 
ORDER BY 1

Best Answer

Add an outer join to a complete series of days using generate_series():

SELECT day, COALESCE(duration, 0) AS duration
FROM  (  -- your original query, just added table aliases to reduce noise
         -- and fixed the upper bound of your time range
   SELECT date_trunc('day', t.start_at) AS day
        , sum(duration) AS duration  -- add table qualification to duration
   FROM   time_logs     t
   JOIN   users         u  ON t.user_id = u.id
   JOIN   account_users a  ON a.user_id = u.id
   JOIN   categories    c  ON c.id = t.activity_id
   WHERE  a.account_id IN (1,2,3)
   AND    t.start_at >='2017-03-01 00:00 +1'
   AND    t.start_at < '2017-04-01 00:00 +1' -- to get the whole month
   AND    c.report_group = 'Segment1'
   GROUP  BY 1 
   ) sub
RIGHT  JOIN generate_series(timestamptz '2017-03-01 00:00 +1'
                                      , '2017-03-31 00:00 +1'  -- !
                                      , '1 day') day USING (day)
ORDER  BY 1;

And COALESCE to get 0 ("zero") instead of NULL, if you need that.

BETWEEN '2017-03-01 00:00 +1' AND '2017-03-31 00:00 +1' would exclude the last day of march (except for its very first µs). I changed it to include the whole month. AND t.start_at < '2017-04-01 00:00 +1' has the additional advantage that you don't need to adapt the day of the upper bound, just the month.

Since start_at is obviously timestamptz, that's the most efficient call with matching data type. Detailed explanation here:

The actual data type of start_at and your actual time zone setting might lead to disagreement when days start. Consider: