PostgreSQL – Transform Row Data Based on Preceding Row for Cumulative Sum

migrationpostgresqlwindow functions

Following the answer to this question, I have managed to produce the following output to get a running sum of values:

id      creation                    operation       value       running sum
SyJw-c  2016-09-01 00:11:08.307419  positive_op_1   1.33        28.82
SyJw-c  2016-08-21 08:32:54.431662  negative_op_1   -1          27.49
SyJw-c  2016-08-18 07:38:33.878365  positive_op_2   1           28.49
SyJw-c  2016-08-14 18:12:03.599797  negative_op_1   -1          27.49
SyJw-c  2016-08-02 15:44:29.693303  positive_op_1   1.33        28.49
SyJw-c  2016-07-31 12:08:50.659905  override_op_1   4.66        27.16
SyJw-c  2016-06-26 06:53:54.537603  negative_op_1   -3.5        22.5
SyJw-c  2016-05-31 13:34:08.005687  negative_op_1   -1          26
SyJw-c  2016-05-31 13:34:04.776970  negative_op_1   -1          27
SyJw-c  2016-05-31 11:27:09.502983  override_op_2   28          28

BUT my case is more complex. Not only do I need to sum up the values, I need to be able to perform a conversion first to some rows based on the running sum of the row right beneath it.

Let me first explain the motivation:

Currently I have a table with incremental, decremental and override operations. I would like to port the data to a table with only incremental and decremental operations such that I would be able to straightforwardly sum up the values. I am not looking to maintain the old table, simply a way to migrate the data into a simpler model and henceforward to append data to the new table only.

Taken the "raw" table above, I would like to write a query (I am running on postgresql 9.5) and get a table as closely resembling the below. (Conversely, I would like to know that what I am attempting is impossible)

Note the override operators are interspersed between the normal operators, they may appear more than twice as in the example, also, all initial operators (the earliest in the table) are override with an initial value that should be taken into account as in the example below. Moreover I had shown only data belonging to one group (same id) but the general idea is to perform this migration for all groups. Lastly I show the math in parentheses, I don't need that in the result, it is for the example only:

id      creation    oper      transformed_op   value   transformed_value       running sum
SyJw-c  2016- ...  pos_op_1                    1.33    1.33                    10.98
SyJw-c  2016- ...  neg_op_1                    -1      -1                      9.65
SyJw-c  2016- ...  pos_op_2                    1       1                       10.65
SyJw-c  2016- ...  neg_op_1                    -1      -1                      9.65
SyJw-c  2016- ...  pos_op_1                    1.33    1.33                    10.65
SyJw-c  2016- ...  ovr_op_1   new_rel_op_1     4.66    (4.66-22.5) = -17.84    4.66
SyJw-c  2016- ...  neg_op_1                    -3.5    -3.5                    22.5
SyJw-c  2016- ...  neg_op_1                    -1      -1                      26
SyJw-c  2016- ...  neg_op_1                    -1      -1                      27
SyJw-c  2016- ...  ovr_op_2   new_rel_op_2     28      (28-0) = 28             28

The table is shown ordered from last to first. The 22.5 is 26 - 3.5. This subtraction should be done (this_value - previous_sum) depends on the value of the transformed_op. When the original op had been an override op I would like to perform some action based on the running sum of the row proceeding it (ordered by creation desc), in this case subtract that running sum value from the value in the value column.

Best Answer

Based on this table definition:

CREATE TABLE tbl (  -- no PK?
  id text NOT NULL
, creation timestamp UNIQUE NOT NULL
, operation text NOT NULL
, value numeric NOT NULL
, running_sum numeric  -- optional (not needed for task)
);

Data types and constraints are almost always essential.
(creationdoes not strictly have to be unique. But if there can be duplicate values per group (id), you need to do more.)

Basic query to compute your special running sum

SELECT id, creation, operation, value
     , sum(value) OVER (PARTITION BY id, run ORDER BY creation) AS running_sum
FROM   (
   SELECT *, count(*) FILTER (WHERE operation  LIKE 'override_op_%')
                      OVER (PARTITION BY id ORDER BY creation) AS run
   FROM   tbl
   ) t
ORDER  BY id, creation DESC;

Any operation name starting with 'override_op_' indicates the start of a new run (group, patch, partition).

In addition to the related answer you already linked to:

Consider this related question for details how to partition rows into groups (run in this query, since you are using the term "group" for the id column)

I use the new aggregate FILTER clause for the partial count:

You could use the simpler (less clear) expression in older versions:

count(operation LIKE 'override_op_%' OR NULL)

Compute delta

Based on this, you can easily compute the desired delta:

SELECT *
     , running_sum - lag(running_sum, 1, numeric '0')  -- data type must match!
                     OVER (PARTITION BY id ORDER BY creation) AS transformed_value 
FROM (
   SELECT id, creation, operation, value
        , sum(value) OVER (PARTITION BY id, run ORDER BY creation) AS running_sum
   FROM   (
      SELECT *, count(*) FILTER (WHERE operation  LIKE 'override_op_%')
                OVER (PARTITION BY id ORDER BY creation) AS run
      FROM   tbl
      ) t
   ) t
ORDER  BY id, creation DESC;

I use the 3-parameter form of the window function lag() to provide 0 (data type must match!) as default value for the first row in the table.