Postgresql – Transformation of row data based on a grouping of rows preceding it

migrationpostgresqlwindow functions

The basis for this question is: Transformation of row data based on preceding row for special cumulative sum which I had asked recently.

I will now raise the level of complexity a bit.

First, I will quickly reiterate over the make up of the data.

This is the initial data:

id      creation                    operation       value       running sum
SyJw-c  2016-09-01 00:11:08.307419  positive_op_1   1.33        28.82
SyJw-c  2016-08-21 08:32:54.431662  negative_op_1   -1          27.49
SyJw-c  2016-08-18 07:38:33.878365  positive_op_2   1           28.49
SyJw-c  2016-08-14 18:12:03.599797  negative_op_1   -1          27.49
SyJw-c  2016-08-02 15:44:29.693303  positive_op_1   1.33        28.49
SyJw-c  2016-07-31 12:08:50.659905  override_op_1   4.66        27.16
SyJw-c  2016-06-26 06:53:54.537603  negative_op_1   -3.5        22.5
SyJw-c  2016-05-31 13:34:08.005687  negative_op_1   -1          26
SyJw-c  2016-05-31 13:34:04.776970  negative_op_1   -1          27
SyJw-c  2016-05-31 11:27:09.502983  override_op_2   28          28

As clarified by @Erwin Brandstetter

This is the definition of the table

CREATE TABLE tbl (  -- no PK
  id text NOT NULL
, creation timestamp UNIQUE NOT NULL
, operation text NOT NULL
, value numeric NOT NULL
, running_sum numeric  -- optional (not needed for task)
);

This is the transformed data:

id      creation    oper      transformed_op   value   transformed_value       running sum
SyJw-c  2016- ...  pos_op_1                    1.33    1.33                    10.98
SyJw-c  2016- ...  neg_op_1                    -1      -1                      9.65
SyJw-c  2016- ...  pos_op_2                    1       1                       10.65
SyJw-c  2016- ...  neg_op_1                    -1      -1                      9.65
SyJw-c  2016- ...  pos_op_1                    1.33    1.33                    10.65
SyJw-c  2016- ...  ovr_op_1   new_rel_op_1     4.66    (4.66-22.5) = -17.84    4.66
SyJw-c  2016- ...  neg_op_1                    -3.5    -3.5                    22.5
SyJw-c  2016- ...  neg_op_1                    -1      -1                      26
SyJw-c  2016- ...  neg_op_1                    -1      -1                      27
SyJw-c  2016- ...  ovr_op_2   new_rel_op_2     28      (28-0) = 28             28

I run on postgresql 9.5


I wish to be able to calculate the running_sum and the transformed_value.

In contrast to the question asked and answered above, the predicate for calculating the transformation is not based solely on the row previous (here the table, as in the other question, is ordered by creation DESC).
The logic is as follows:

  • If the operation is an override_op_2 then the running_sum gets the value of the value column and the transformed_value is the value of the value column subtracted by the value of the running_sum of the previous row` (Nothing different than the original question)
  • If the operation is an override_op_1 then the logic follows that of the override_op_2 unless an override_op_2 had previously been encountered (for the id grouping), in this case it should be disregarded (or more explicitly the transformed_value is 0)

What seems more difficult here than in the original question is that the calculation is not based directly on the row preceding it but on a number of rows which is unknown in advance.

While I guess you could do that within a function (that stores a perviously encountered override_op_2 value (or NULL)), I wonder of it could be done in simple sql.

Best Answer

  • If the operation is an override_op_1 then the logic follows that of the override_op_2 unless an override_op_2 had previously been encountered (for the id grouping), in this case it should be disregarded (or more explicitly the transformed_value is 0)

In effect, to disregard such rows, change them:

  • value = 0 - no change in running sum.
  • operation = 'zero_op' - any non-overriding type (don't start a new partition)

Do that to qualifying rows before applying everything else I already worked out in my previous answer:

SELECT id, creation
     , CASE WHEN operation = 'override_op_1'
            AND  had_op_2 THEN 'zero_op' ELSE operation END AS operation
     , CASE WHEN operation = 'override_op_1'
            AND  had_op_2 THEN 0 ELSE value END AS value
FROM  (
   SELECT id, creation, operation, value
        , bool_or(operation = 'override_op_2')  -- `override_op_2` previously encountered
          OVER (PARTITION BY id ORDER BY creation) AS had_op_2
   FROM   tbl
) t

So the complete query becomes:

SELECT *
     , running_sum - lag(running_sum, 1, numeric '0')  -- data type must match!
                     OVER (PARTITION BY id ORDER BY creation) AS transformed_value 
FROM (
   SELECT id, creation, operation, value
        , sum(value) OVER (PARTITION BY id, run ORDER BY creation) AS running_sum
   FROM   (
      SELECT *, count(*) FILTER (WHERE operation LIKE 'override_op_%')
                OVER (PARTITION BY id ORDER BY creation) AS run
      FROM  (
         SELECT id, creation
              , CASE WHEN operation = 'override_op_1'
                     AND  had_op_2 THEN 'zero_op' ELSE operation END AS operation
              , CASE WHEN operation = 'override_op_1'
                     AND  had_op_2 THEN 0 ELSE value END AS value
         FROM  (
            SELECT id, creation, operation, value
                 , bool_or(operation = 'override_op_2')
                   OVER (PARTITION BY id ORDER BY creation) AS had_op_2
            FROM   tbl
            ) t
         ) t
      ) t
   ) t
ORDER  BY id, creation DESC;

Alternatively, you could just delete such rows or move them out of the way (SET id = id * -1). Not sure if you need to preserve these "zero_op" rows in the result.