Postgresql – Sum duration for each day of a month

functionspostgresqltimestamp

Question: I would like to create a query that can return a list of all days of the month with a sum of all duration for each day including zero for each day that has no data.

Database: PostgreSQL 9.x

Table structure (only relevant columns shown):

-------------------------------------------------------------
| id | start_at            | finish_at           | duration |
-------------------------------------------------------------
| 1  | 2015-08-14 14:01:00 | 2015-08-15 13:59:00 | 86280    |
-------------------------------------------------------------

I have the following query for grabbing the data I want but the part that is missing is the days between with zero and I'm not entirely sure on the best approach to achieve the desired outcome.

SELECT date_trunc('day', start_at) AS "day" , sum(duration) AS "duration"
FROM time_logs
JOIN users ON time_logs.user_id = users.id
JOIN account_users ON account_users.user_id = users.id
JOIN categories ON categories.id = time_logs.activity_id
WHERE account_users.account_id IN (1,2,3)
AND time_logs.start_at BETWEEN '2017-03-01 00:00:00 +1000' AND '2017-03-31 00:00:00 +1000'
AND categories.report_group = 'Segment1'
GROUP BY 1 
ORDER BY 1

Best Answer

Add an outer join to a complete series of days using generate_series():

SELECT day, COALESCE(duration, 0) AS duration
FROM  (  -- your original query, just added table aliases to reduce noise
         -- and fixed the upper bound of your time range
   SELECT date_trunc('day', t.start_at) AS day
        , sum(duration) AS duration  -- add table qualification to duration
   FROM   time_logs     t
   JOIN   users         u  ON t.user_id = u.id
   JOIN   account_users a  ON a.user_id = u.id
   JOIN   categories    c  ON c.id = t.activity_id
   WHERE  a.account_id IN (1,2,3)
   AND    t.start_at >='2017-03-01 00:00 +1'
   AND    t.start_at < '2017-04-01 00:00 +1' -- to get the whole month
   AND    c.report_group = 'Segment1'
   GROUP  BY 1 
   ) sub
RIGHT  JOIN generate_series(timestamptz '2017-03-01 00:00 +1'
                                      , '2017-03-31 00:00 +1'  -- !
                                      , '1 day') day USING (day)
ORDER  BY 1;

And COALESCE to get 0 ("zero") instead of NULL, if you need that.

BETWEEN '2017-03-01 00:00 +1' AND '2017-03-31 00:00 +1' would exclude the last day of march (except for its very first µs). I changed it to include the whole month. AND t.start_at < '2017-04-01 00:00 +1' has the additional advantage that you don't need to adapt the day of the upper bound, just the month.

Since start_at is obviously timestamptz, that's the most efficient call with matching data type. Detailed explanation here:

Generating time series between two dates in PostgreSQL

The actual data type of start_at and your actual time zone setting might lead to disagreement when days start. Consider:

Related Solutions

Postgresql – GROUP BY very slow when returning many rows

Off the top of my head you can reduce the number of cycles by using a query wrapper and grouping on that. For example:

zabbix=> \copy (
select hostid,
    value,
    extractDay,
    extractMonth,
    intYear,
    DailyUsage
from
(
select 
    h.hostid, 
    sum(value) as value, 
    extract(day from to_timestamp(clock)) as extractDay, 
    extract(month from to_timestamp(clock)) as extractMonth, 
    2015 as intYear, 
   'DailyExemptUsage' as DailyUsage
from 
   hosts h, 
   items i, 
   history_uint hu
where 
   h.hostid = i.hostid
   and i.itemid = hu.itemid
   and i.name like 'Air%bound%Tot%'
   and clock between 1430463600  and 1432969200 
   and extract(hour from to_timestamp(clock)) between 1 and 5
   and h.name like '172.xxx.%'
) as source
group by 
   hostid, 
   extractDay,
   extractMonth
) 
to '/home/me/my_file' with csv;

Postgresql – Aggregate sales of the past 12 months for the current row date

Something like this should work..

-- IN A CTE
-- Grab the idclient, and the monthly range needed
-- We need the range because you can't sum over NULL (yet, afaik).
WITH idclient_month AS (
  SELECT idclient, month_transac
  FROM (
    SELECT idclient, min(month_transac), max(month_transac)
    FROM foo
    GROUP BY idclient
  ) AS t
  CROSS JOIN LATERAL generate_series(min::date, max::date, '1 month')
    AS gs(month_transac)
)
-- If we move this where clause down the rows get filtered /before/ the window function
SELECT *
FROM (

  SELECT
    idclient,
    month_transac,
    monthly_sales,
    sum(monthly_sales) OVER (
      PARTITION BY idclient
      ORDER BY month_transac
      ROWS 12 PRECEDING
    )
      - monthly_sales
      AS sales_ttm

  -- Here, we sum up the sales by idclient, and month
  -- We coalesce to 0 so we can use this in a window function
  FROM (
    SELECT idclient, month_transac, coalesce(sum(sales), 0) AS monthly_sales
    FROM foo
    RIGHT OUTER JOIN idclient_month
      USING (idclient,month_transac)
    GROUP BY idclient, month_transac
    ORDER BY idclient, month_transac
  ) AS t

) AS g
WHERE g.monthly_sales > 0;

Here we

Calculate the date-range for the idclient in a CTE.

SELECT idclient, month_transac
FROM (
  SELECT idclient, min(month_transac), max(month_transac)
  FROM foo
  GROUP BY idclient
) AS t
CROSS JOIN LATERAL generate_series(min::date, max::date, '1 month')
  AS gs(month_transac)
 idclient  |     month_transac      
-----------+------------------------
 511656A75 | 2010-06-01 00:00:00-05
 511656A75 | 2010-07-01 00:00:00-05
 511656A75 | 2010-08-01 00:00:00-05
 511656A75 | 2010-09-01 00:00:00-05
 511656A75 | 2010-10-01 00:00:00-05
 511656A75 | 2010-11-01 00:00:00-05
 511656A75 | 2010-12-01 00:00:00-06
 511656A75 | 2011-01-01 00:00:00-06
 [....]

RIGHT OUTER that CTE to a our sample dataset. We do this so we grow our sample dataset and we have entries with monthly_sales = 0 where needed.
Use a window function that uses windows over ROWS 12 PRECEDING. That's the key. That's the past 12 months. The window function can't operate on rows that are null, so we set them to 0 before we get to this step.
Select just the rows where monthly_sales > 0. We have to do this after the window function so as not to much with what is available for calculation (the window).

Output,

 idclient  |     month_transac      | monthly_sales | sales_ttm 
-----------+------------------------+---------------+-----------
 511656A75 | 2010-06-01 00:00:00-05 |         68.57 |      0.00
 511656A75 | 2010-07-01 00:00:00-05 |         88.63 |     68.57
 511656A75 | 2010-08-01 00:00:00-05 |         94.91 |    157.20
 511656A75 | 2010-09-01 00:00:00-05 |         70.66 |    252.11
 511656A75 | 2010-10-01 00:00:00-05 |         28.84 |    322.77
 511656A75 | 2015-10-01 00:00:00-05 |         85.00 |      0.00
 511656A75 | 2015-12-01 00:00:00-06 |        114.42 |     85.00
 511656A75 | 2016-01-01 00:00:00-06 |        137.08 |    199.42
 511656A75 | 2016-03-01 00:00:00-06 |        172.92 |    336.50
 511656A75 | 2016-04-01 00:00:00-05 |        125.00 |    509.42
 511656A75 | 2016-05-01 00:00:00-05 |        127.08 |    634.42
 511656A75 | 2016-06-01 00:00:00-05 |        104.17 |    761.50
 511656A75 | 2016-07-01 00:00:00-05 |         98.22 |    865.67
 511656A75 | 2016-08-01 00:00:00-05 |         37.08 |    963.89
 511656A75 | 2016-10-01 00:00:00-05 |        108.33 |   1000.97
 511656A75 | 2016-11-01 00:00:00-05 |        104.17 |   1024.30
 511656A75 | 2017-01-01 00:00:00-06 |        201.67 |   1014.05
(17 rows)

Best Answer

Related Solutions

Postgresql – GROUP BY very slow when returning many rows

Postgresql – Aggregate sales of the past 12 months for the current row date

Related Question