I am using SQL Server 2012 and have a db with around 2 million rows. I need to figure out how to calculate a rolling sum over a 30 day range and return any results over a set amount (amt_pur
). These are transaction orders so the id number (ID_NO
) is the same but transaction date(TD
) varies by minutes to many years at times. So if a customer ordered 10 items within any 30 day period I need those results. I tried something like this and many googled variations:
select id_no, td,
sum(amt_pur) as amtpur from db1 where td between td and dateadd(day,30,td)
group by id_no, td, amt_pur having sum(amt_pur)>10
order by amtpur desc
This does not get me rolling type calculation results. I have a specific ID I can verify this with so I know I'm not getting the right ones. Please help!
Best Answer
I haven't tested this extensively but I think it can do what you need. Here I assume that ID_NO is a customer identifier, TD is the date of the order, and AMT_PUR represents the amount ordered. You want all of the rows for when a customer has ordered 10 or more of anything over a 30 day date range starting with that day.
Here is a little bit of test data:
Suppose that you just looped over each ID_NO in reverse TD order and kept track of the running sum of AMT_PUR. You couldn't just keep the rows with a running sum that's greater or equal to 10 because some of the rows previously added to the total may not be within 30 days of the current row. However, if there was a way to remove rows too far away from the current row then you could just use the running sum and you could do your calculation with a single pass over the data.
The idea behind this approach is to UNION ALL together two copies of the data. The second copy will have the AMT_PUR value reversed and will have 31 days subtracted from the original value. With the data in that format you can perform the calculation with just one pass. We can loop over the data this way using SUM() with the OVER() clause.
Minor update: It would probably be more efficient to CROSS JOIN db1 to a two row table instead of using UNION ALL, but I'm going to keep the UNION ALL approach in my code because I think that's easier to understand.