Postgresql – Calculate rolling sum over 7 consecutive day period on PostgreSQL

aggregatepostgresqlpostgresql-9.4window functions

I need to get the rolling sum over a period of 7 days for each row (1 row per day).

For example:

| Date       | Count | 7-Day Rolling Sum |
------------------------------------------
| 2016-02-01 | 1     | 1
| 2016-02-02 | 1     | 2
| 2016-02-03 | 2     | 4
| 2016-02-04 | 2     | 6
| 2016-02-05 | 2     | 8
| 2016-02-06 | 2     | 10
| 2016-02-07 | 2     | 12
| 2016-02-08 | 2     | 13 --> here we start summing from 02-02
| 2016-02-09 | 2     | 14 --> here we start summing from 02-03
| 2016-02-10 | 5     | 17 --> here we start summing from 02-04

I need this in one query that returns the rows with 7-day rolling sum and the date of the last day of the range of the sum. For example, day=2016-02-10, sum 17.

So far I have this but it's not fully working:

DO
$do$
DECLARE 
    curr_date date;
    num bigint;
BEGIN
FOR curr_date IN (SELECT date_trunc('day', d)::date FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d)
LOOP 
    SELECT curr_date, SUM(count)
    FROM generate_series (curr_date-8, curr_date-1, '1 day'::interval) d
    LEFT JOIN m.ping AS p ON p.date = d
    LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
    LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
    WHERE
        pt.url_slug = 'active' AND
        pf.url_slug = 'weekly';
END LOOP;
END
$do$;

I am using PostgreSQL 9.4.5. There could be multiple rows with the same date. If there is a gap (a day is missing), the 7-consecutive-day range will still be followed.

Best Answer

By far the cleanest solution is to use window function sum with rows between:

with days as (
        SELECT date_trunc('day', d)::date as day
        FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
    counts as (
        select 
            days.day,
            sum((random()*5)::integer) num
        FROM days
        -- left join other tables here to get counts, I'm using random
        group by days.day
    )
select
    day,
    num,
    sum(num) over (order by day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from counts
order by day;

The important part is to generate the timeframe in days CTE and join onto it in order not to miss any days for which there is no data.

Example

For example, if I create some test data with 20 records in last 14 days:

SELECT (current_date - ((random()*14)::integer::text || 'days')::interval)::date as day, (random()*7)::integer as num
into test_data from generate_series(1, 20);;

And also add a value before that:

insert into test_data values ((current_date - '25 days'::interval), 5);

Then use the query above:

with days as (
        SELECT date_trunc('day', d)::date as day
        FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
    counts as (
        select 
            days.day,
            sum(t.num) num
        FROM days
        left join test_data t on t.day = days.day
        group by days.day
    )
select
    day,
    num,
    sum(num) over (order by day rows between 6 preceding and current row)
from counts
order by day;

And get the results for whole month:

    day     | num | sum 
------------+-----+-----
 2016-01-31 |     |    
 2016-02-01 |     |    
 2016-02-02 |     |    
 2016-02-03 |     |    
 2016-02-04 |     |    
 2016-02-05 |     |    
 2016-02-06 |   5 |   5
 2016-02-07 |     |   5
 2016-02-08 |     |   5
 2016-02-09 |     |   5
 2016-02-10 |     |   5
 2016-02-11 |     |   5
 2016-02-12 |     |   5
 2016-02-13 |     |    
 2016-02-14 |     |    
 2016-02-15 |     |    
 2016-02-16 |     |    
 2016-02-17 |     |    
 2016-02-18 |   2 |   2
 2016-02-19 |   5 |   7
 2016-02-20 |     |   7
 2016-02-21 |   4 |  11
 2016-02-22 |  15 |  26
 2016-02-23 |   1 |  27
 2016-02-24 |   1 |  28
 2016-02-25 |   2 |  28
 2016-02-26 |   4 |  27
 2016-02-27 |   9 |  36
 2016-02-28 |   5 |  37
 2016-02-29 |  11 |  33
 2016-03-01 |   5 |  37
(31 rows)