Postgresql split date range by business days then aggregate by month

aggregatepostgresql

I have the following rows in a table (dd-mm-yyyy).

Start      | End        | Value
---------------------------------
01-01-2019 | 31-03-2019 | 64
01-02-2019 | 30-04-2019 | 126
01-03-2019 | 31-05-2019 | 66

I would like to divide the values by number of working days (just mon-fri, not holidays) between the start and end dates, then aggregate the values that occur in each month in postgres.

Based on this year's working days per month being:

Month  | Working Days
--------------------
Jan 19 | 23
Feb 19 | 20
Mar 19 | 21
Apr 19 | 22
May 19 | 23

The 1st row has 64 working days in Jan, Feb, Mar – so it's value per working day is 1. And has an aggregate value of 23 in Jan, 20 in Feb, 21 in Mar.
The 2nd row has 126 working days in Feb, Mar, Apr – so it's value per working day is 2. And has an aggregate value of 40 in Feb, 42 in Mar, 44 in Apr.
The 3rd row has 66 working days in Mar, Apr, May – so it's value per working day is 1. And has an aggregate value of 21 in Mar, 22 in Apr, 23 in May.
If we sum all the aggregate values for each month for each row we should get the result below.

Month   | Value
----------------
01-2019 | 23
02-2019 | 60
03-2019 | 84
04-2019 | 66
05-2019 | 23

So it's like grouping by month but the values are weighted by the number of working days per month.

is it possible to do this in postgres?

Best Answer

Assuming the table has a primary (or unique) key, then you can calculate the "value per day" using this:

select to_char(dt, 'yyyy-mm') as month, 
       value, 
       value::numeric / count(*) over (partition by id) as value_per_day
from data
  left join generate_series(start, "end", interval '1 day') as t(dt) on true
where extract(dow from dt) not in (0,6) ;

generate_series() generates rows for each day between start and end, and the where clause removes the weekends. id is the unique identifier for the original rows it is required to be able to count the number of days "per row". With that result we can now aggregates the final result:

select month, sum(value_per_day)
from ( 
  select to_char(dt, 'yyyy-mm') as month, 
         value, 
         value::numeric / count(*) over (partition by id) as value_per_day
  from data
    left join generate_series(start, "end", interval '1 day') as t(dt) on true
  where extract(dow from dt) not in (0,6)  
) t
group by month
order by month;

Online example: https://rextester.com/KPGOFU25611

Related Solutions

Postgresql – Creating a GroupBy query to include a result for when no results match

The usual way to get a series in postgres is with generate_series. This function produces a series of integers or timestamps - you can use either but assuming your 'dates' are really timestamptz, here's how you might go about it if you are on 8.4 or above:

testbed:

create table sales(sales_date timestamptz, sales_amount numeric);
insert into sales(sales_date, sales_amount) values('2011-01-15 12:00', 100);
insert into sales(sales_date, sales_amount) values('2011-02-15 12:00', 240);
insert into sales(sales_date, sales_amount) values('2011-04-15 12:00', 400);
insert into sales(sales_date, sales_amount) values('2011-04-16 12:00', 30);

query:

with w as ( select month, sum(sales_amount) as total
            from (select date_trunc('month',sales_date) as month, sales_amount from sales) z
            group by month )
select to_char(month, 'fmMon') as month, coalesce(total, 0) as total
from (select generate_series(min(month), max(month), '1 month'::interval) as month from w) m
     left outer join w using(month);

result:

 month | total
-------+-------
 Jan   |   100
 Feb   |   240
 Mar   |     0
 Apr   |   430

--edit: a bit of extra detail on the query:

produce a summary of sales by month (but no month present if no sales):

with w as ( select month, sum(sales_amount) as total
            from ( select date_trunc('month',sales_date) as month, sales_amount
                   from sales ) z
            group by month )

which could alternatively be written as:

with w as ( select date_trunc('month',sales_date) as month, sum(sales_amount) as total
            from sales
            group by date_trunc('month',sales_date) )

produce an unbroken series of months (without sales) from the minimum to the maximum:

select generate_series(min(month), max(month), '1 month'::interval) as month from w

outer join the unbroken series to the summary of sales by month to produce an unbroken series with sales (or null sales if no sales present):
```
left outer join w using(month)
```
for the months with null sales, change the null to a 0:
```
coalesce(total, 0)
```

Postgresql – Postgres 9.2 select multiple specific rows in one query

`UNION ALL`

I would go with a simple UNION ALL query here:

(
SELECT *
FROM   tbl
WHERE  timein >= '2013-04-26 0:0'
AND    timein <  '2013-04-27 0:0'
ORDER  BY timein DESC
LIMIT 1
)
UNION ALL
(
SELECT *
FROM   tbl
WHERE  timein >= '2013-04-27 0:0'
AND    timein <  '2013-04-30 0:0'
ORDER  BY timein
);

This is a single query to Postgres.
Parentheses are required in this case.

`NOT EXISTS`

Alternative, most likely slower:

SELECT *
FROM   tbl t
WHERE  timein >= '2013-04-26 0:0'
AND    timein <  '2013-04-30 0:0'
AND    NOT EXISTS (
   SELECT 1 FROM tbl t1
   WHERE  t1.timein >= '2013-04-26 0:0'
   AND    t1.timein <  '2013-04-27 0:0'
   AND    t1.timein > t.timein
    )
ORDER  BY timein;

-> SQLfiddle.

Best Answer

Related Solutions

Postgresql – Creating a GroupBy query to include a result for when no results match

Postgresql – Postgres 9.2 select multiple specific rows in one query

UNION ALL

NOT EXISTS

Related Question

`UNION ALL`

`NOT EXISTS`