Sql-server – Rolling SUM with a 30 day range

group bysql serversum

I am using SQL Server 2012 and have a db with around 2 million rows. I need to figure out how to calculate a rolling sum over a 30 day range and return any results over a set amount (amt_pur). These are transaction orders so the id number (ID_NO) is the same but transaction date(TD) varies by minutes to many years at times. So if a customer ordered 10 items within any 30 day period I need those results. I tried something like this and many googled variations:

select id_no, td, 
sum(amt_pur) as amtpur from db1 where td between td and dateadd(day,30,td)
group by id_no, td, amt_pur having sum(amt_pur)>10 
order by amtpur desc

This does not get me rolling type calculation results. I have a specific ID I can verify this with so I know I'm not getting the right ones. Please help!

Best Answer

I haven't tested this extensively but I think it can do what you need. Here I assume that ID_NO is a customer identifier, TD is the date of the order, and AMT_PUR represents the amount ordered. You want all of the rows for when a customer has ordered 10 or more of anything over a 30 day date range starting with that day.

Here is a little bit of test data:

CREATE TABLE DB1 (
ID_NO VARCHAR(10) NULL,
TD DATETIME NULL,
AMT_PUR INTEGER NULL
);

BEGIN TRANSACTION;
INSERT INTO DB1 VALUES ('A', '01/01/2016', 3);
INSERT INTO DB1 VALUES ('A', '01/05/2016', 1);
INSERT INTO DB1 VALUES ('A', '01/09/2016', 2);
INSERT INTO DB1 VALUES ('A', '02/01/2016', 7);
INSERT INTO DB1 VALUES ('A', '02/02/2016', 2);
INSERT INTO DB1 VALUES ('A', '03/01/2016', 2);
INSERT INTO DB1 VALUES ('A', '03/04/2016', 1);
INSERT INTO DB1 VALUES ('B', '02/01/2016', 3);
INSERT INTO DB1 VALUES ('B', '02/28/2016', 6);
INSERT INTO DB1 VALUES ('B', '03/05/2016', 3);
INSERT INTO DB1 VALUES ('B', '03/28/2016', 6);
COMMIT TRANSACTION;

Suppose that you just looped over each ID_NO in reverse TD order and kept track of the running sum of AMT_PUR. You couldn't just keep the rows with a running sum that's greater or equal to 10 because some of the rows previously added to the total may not be within 30 days of the current row. However, if there was a way to remove rows too far away from the current row then you could just use the running sum and you could do your calculation with a single pass over the data.

The idea behind this approach is to UNION ALL together two copies of the data. The second copy will have the AMT_PUR value reversed and will have 31 days subtracted from the original value. With the data in that format you can perform the calculation with just one pass. We can loop over the data this way using SUM() with the OVER() clause.

SELECT
ID_NO
, TD
, REVERSE_RUNNING_SUM
FROM
(
    SELECT 
      ID_NO
    , TD
    , SUM(AMT_PUR) OVER (PARTITION BY ID_NO ORDER BY TD DESC, VALID_FLAG ASC) REVERSE_RUNNING_SUM
    , VALID_FLAG
    FROM 
    (
        SELECT ID_NO
        , TD
        , AMT_PUR
        , 1 VALID_FLAG
        FROM db1

        UNION ALL

        SELECT ID_NO
        , DATEADD(DAY, -31, TD) TD
        , -1 * AMT_PUR
        , -1 VALID_FLAG
        FROM db1
    ) t
) tt
WHERE tt.VALID_FLAG = 1
AND tt.REVERSE_RUNNING_SUM >= 10;

Minor update: It would probably be more efficient to CROSS JOIN db1 to a two row table instead of using UNION ALL, but I'm going to keep the UNION ALL approach in my code because I think that's easier to understand.