Mysql – Update with rolling sum grouped by another column

casemysql-8.0performancequeryquery-performancesum

I'm trying to build a query to update a MySQL 8.0 table that looks like this

-----------------------
user_id | total | delta
-----------------------
      1 |     0 |    30
      1 |     0 |   -10
      2 |     0 |    -5
      2 |     0 |    10
      2 |     0 |   -10
      3 |     0 |    30

Into this:

-----------------------
user_id | total | delta
-----------------------
      1 |    30 |    30
      1 |    20 |   -10
      2 |    -5 |    -5
      2 |     5 |    10
      2 |    -5 |   -10
      3 |    30 |    30

Basically it should compute a rolling sum for each user and update the table with the value at each row. You can think of each row as being a transaction that takes place at a certain timepoint, but for simplicity's sake I've just removed the date info. For this example just presume that they're already ordered by 'date'.

It's easy to do something like this to do a rolling sum over the entire column:

SET @sum := 0;
UPDATE tablename
SET total = (@sum := @sum + delta)

But I'm not sure how to get it to perform separate rolling sums. I also have hundreds of millions of entries, so it needs to be performant.

Best Answer

You will need some defined ordering of your rows, otherwise, you will end up with a different CUMULATIVE SUM every time you run your update. I added an auto-increment column, but you likely have a timestamp or similar you can use:

create table T 
( seqid int not null auto_increment primary key
, user_id int not null
, total int not null
, delta int not null 
);

insert into T (user_id, total, delta)
values (1, 0, 30)
     , (1, 0, -10)
     , (2, 0, -5)
     , (2, 0, 10)
     , (2, 0, -10)
     , (3, 0, 30);

Window functions were introduced in SQL99 some 20 years ago, but only recently made it into MySQL. They are very handy for this kind of task:

with TT as (
    select seqid, user_id, delta, total
         , sum(delta) over (partition by user_id order by seqid) as new_total
    from T
)
update T, TT
    set T.total = TT.new_total
where T.seqid = TT.seqid;

For some reason I could not get the update to work with an ansi join, it is left as an exercise for the reader;-)