Consider the following table:
ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL
---+----------+-----------+-----------+-----
1 | 1 | 1 | (null) | 3
2 | 1 | 2 | (null) | 2
3 | 1 | 3 | (null) | 1
4 | 1 | 4 | 4 | 2
5 | 1 | 5 | (null) | 1
6 | 2 | 1 | (null) | 4
7 | 2 | 2 | 2 | 3
8 | 2 | 3 | (null) | 4
9 | 2 | 4 | (null) | 2
10 | 2 | 5 | (null) | 2
11 | 2 | 6 | (null) | 4
12 | 2 | 7 | 14 | 2
13 | 2 | 8 | (null) | 2
For each row, I need to compute the cumulative sum of VAL
for all previous rows (ordered by ORDER_VAL
and grouped by GROUP_ID
), but each time a non-NULL
RESET_VAL
is encountered, I need to use that value for the sum. The rows that follow also need to build on top of the RESET_VAL
instead of using the actual sum. Note that each group can have multiple reset values.
This is the result I expect for the above table:
ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL | CUMSUM
---+----------+-----------+-----------+-----+-------
1 | 1 | 1 | (null) | 3 | 0
2 | 1 | 2 | (null) | 2 | 3
3 | 1 | 3 | (null) | 1 | 5
4 | 1 | 4 | 4 | 2 | 4
5 | 1 | 5 | (null) | 1 | 6
6 | 2 | 1 | (null) | 4 | 0
7 | 2 | 2 | 2 | 3 | 2
8 | 2 | 3 | (null) | 4 | 5
9 | 2 | 4 | (null) | 2 | 9
10 | 2 | 5 | (null) | 2 | 11
11 | 2 | 6 | (null) | 4 | 13
12 | 2 | 7 | 14 | 2 | 14
13 | 2 | 8 | (null) | 2 | 16
If not for the reset values, I could use a window query:
SELECT temp.*,
COALESCE(SUM(val) OVER (PARTITION BY group_id ORDER BY order_val ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0) AS cumsum
FROM temp;
I was originally mistakenly thinking that I could just put RESET_VAL
at the beginning of the COALESCE
, but this won't work since it won't reset the value for following rows.
I also tried this solution, but it only resets back to zero, not to a value in the column. Adapting it to do so is proving non-trivial because that value must propagate to all following rows.
A recursive query seems like a natural fit, but I haven't been able to figure out how to do that as of yet.
I should probably mention that the table I actually have to deal with is much larger (hundreds of thousands to a couple million rows) than the example above, so please mention if there are any performance pitfalls with any answer.
Best Answer
The following works but there's probably some more clever version. Explanation of the query's logic:
We first find how many "resets" have been done up to and including the current row by counting the not nulls of the
reset_val
column, so we can separate the rows into subgroups.We also use another window function
LAST_VALUE()
withIGNORE NULLS
, so we can find the lastreset_value
.Note that both these window functions
COUNT()
andLAST_VALUE()
have anORDER BY
, therefore the default windowROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. Omitted in the query, to make the code more clear.Assuming that the
val
is not nullable, the other window function could also be shortened, from:(avoiding the
COALESCE()
as well) to:Finally, in the second cte, we use the above found subgroups (using
PARTITION BY group_id, reset_count
) to find the cumulative sums.Test at SQLfiddle.
Another variation, based on @Chris' recursive answer. (slightly improved, works with non-consecutive
order_val
, avoids the finalGROUP BY
).Also works in case the first row of a group has a
reset_val
:Test at SQLfiddle-2.
One more variation, using the older (proprietary)
CONNECT BY
syntax for recursive queries. More compact but I find it more difficult to write and read than the CTE version:Tested at SQLfiddle-3.