Oracle 11g R2 – Cumulative Sum with Reset Value

oracleoracle-11g-r2

Consider the following table:

ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL 
---+----------+-----------+-----------+-----
1  | 1        | 1         | (null)    | 3   
2  | 1        | 2         | (null)    | 2   
3  | 1        | 3         | (null)    | 1   
4  | 1        | 4         | 4         | 2   
5  | 1        | 5         | (null)    | 1   
6  | 2        | 1         | (null)    | 4   
7  | 2        | 2         | 2         | 3   
8  | 2        | 3         | (null)    | 4   
9  | 2        | 4         | (null)    | 2   
10 | 2        | 5         | (null)    | 2   
11 | 2        | 6         | (null)    | 4   
12 | 2        | 7         | 14        | 2   
13 | 2        | 8         | (null)    | 2

For each row, I need to compute the cumulative sum of VAL for all previous rows (ordered by ORDER_VAL and grouped by GROUP_ID), but each time a non-NULL RESET_VAL is encountered, I need to use that value for the sum. The rows that follow also need to build on top of the RESET_VAL instead of using the actual sum. Note that each group can have multiple reset values.

This is the result I expect for the above table:

ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL | CUMSUM
---+----------+-----------+-----------+-----+-------
1  | 1        | 1         | (null)    | 3   | 0
2  | 1        | 2         | (null)    | 2   | 3
3  | 1        | 3         | (null)    | 1   | 5
4  | 1        | 4         | 4         | 2   | 4
5  | 1        | 5         | (null)    | 1   | 6
6  | 2        | 1         | (null)    | 4   | 0
7  | 2        | 2         | 2         | 3   | 2
8  | 2        | 3         | (null)    | 4   | 5
9  | 2        | 4         | (null)    | 2   | 9
10 | 2        | 5         | (null)    | 2   | 11
11 | 2        | 6         | (null)    | 4   | 13
12 | 2        | 7         | 14        | 2   | 14
13 | 2        | 8         | (null)    | 2   | 16

If not for the reset values, I could use a window query:

SELECT temp.*,
       COALESCE(SUM(val) OVER (PARTITION BY group_id ORDER BY order_val ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
                0) AS cumsum
FROM temp;

SQLFiddle of the above

I was originally mistakenly thinking that I could just put RESET_VAL at the beginning of the COALESCE, but this won't work since it won't reset the value for following rows.

I also tried this solution, but it only resets back to zero, not to a value in the column. Adapting it to do so is proving non-trivial because that value must propagate to all following rows.

A recursive query seems like a natural fit, but I haven't been able to figure out how to do that as of yet.

I should probably mention that the table I actually have to deal with is much larger (hundreds of thousands to a couple million rows) than the example above, so please mention if there are any performance pitfalls with any answer.

Best Answer

The following works but there's probably some more clever version. Explanation of the query's logic:

We first find how many "resets" have been done up to and including the current row by counting the not nulls of the reset_val column, so we can separate the rows into subgroups.

We also use another window function LAST_VALUE() with IGNORE NULLS, so we can find the last reset_value.

Note that both these window functions COUNT() and LAST_VALUE() have an ORDER BY, therefore the default window ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Omitted in the query, to make the code more clear.

Assuming that the val is not nullable, the other window function could also be shortened, from:

       COALESCE(SUM(val) OVER 
         (PARTITION BY group_id, reset_count 
          ORDER BY order_val 
          ROWS BETWEEN UNBOUNDED PRECEDING 
                   AND 1 PRECEDING), 0)

(avoiding the COALESCE() as well) to:

       SUM(val) OVER 
         (PARTITION BY group_id, reset_count 
          ORDER BY order_val)
       - val

Finally, in the second cte, we use the above found subgroups (using PARTITION BY group_id, reset_count) to find the cumulative sums.

WITH x AS
  ( SELECT temp.*, 
           COUNT(reset_val) OVER 
               (PARTITION BY group_id 
                ORDER BY order_val)
             AS reset_count,
           COALESCE(LAST_VALUE(reset_val IGNORE NULLS) OVER 
               (PARTITION BY group_id 
                ORDER BY order_val), 0)
             AS reset_value
    FROM temp
  ) ,
y AS 
  ( SELECT x.*,
           COALESCE(SUM(val) OVER 
             (PARTITION BY group_id, reset_count 
              ORDER BY order_val 
              ROWS BETWEEN UNBOUNDED PRECEDING 
                       AND 1 PRECEDING), 0)            
           + reset_value AS cumsum      
    FROM x
  )
SELECT *
FROM y ;

Test at SQLfiddle.

Another variation, based on @Chris' recursive answer. (slightly improved, works with non-consecutive order_val, avoids the final GROUP BY).
Also works in case the first row of a group has a reset_val:

WITH row_nums AS
  ( SELECT id, group_id, order_val, reset_val, val, 
           ROW_NUMBER() OVER (PARTITION BY group_id
                              ORDER BY order_val)
             AS rn
    FROM temp
  ) ,
updated_temp (id, group_id, order_val, reset_val, val, rn, cumsum) AS
  ( SELECT id, group_id, order_val, reset_val, val, rn, 
           COALESCE(reset_val, 0)
    FROM row_nums
    WHERE rn = 1
  UNION ALL
    SELECT curr.id, curr.group_id, curr.order_val, curr.reset_val, curr.val, curr.rn, 
           COALESCE(curr.reset_val, prev.val + prev.cumsum) 
    FROM row_nums  curr 
      JOIN updated_temp  prev 
        ON  curr.rn-1 = prev.rn 
        AND curr.group_id = prev.group_id
  )
SELECT id, group_id, order_val, reset_val, val, cumsum
FROM updated_temp
ORDER BY group_id, order_val ;

Test at SQLfiddle-2.

One more variation, using the older (proprietary) CONNECT BY syntax for recursive queries. More compact but I find it more difficult to write and read than the CTE version:

WITH row_nums AS
  ( SELECT id, group_id, order_val, reset_val, val, 
           ROW_NUMBER() OVER (PARTITION BY group_id
                              ORDER BY order_val)
             AS rn,
           COALESCE(reset_val, 0) AS cumsum
    FROM temp
  ) 
SELECT id, group_id, order_val, reset_val, val, rn,  
       COALESCE(reset_val, PRIOR val + PRIOR cumsum, 0) AS cumsum
FROM row_nums
START WITH rn = 1 OR reset_val IS NOT NULL
CONNECT BY  rn-1 = PRIOR rn 
        AND group_id = PRIOR group_id
        AND reset_val IS NULL 
ORDER BY group_id, order_val ;

Tested at SQLfiddle-3.

Best Answer

Related Solutions

Sql-server – Oracle query against SQL Server database giving inconsistent results

LNNVL Justification

Related Question