Sql-server – MS SQL rolling grouped sum for every new data entry

group bysql server

I am using MS SQL 2016, and I have a table which looks something like that (simplified)

 Object      Date         Amount  Owner  TypeOwner
  O1        20180101        10      A     X      
  O2        20180101        50      B     Y
  O1        20180101        30      C     X
  O1        20180203        25      B     Y
  O2        20180203        20      B     Y
  O1        20180206        15      A     X
  O2        20180206        20      A     X

Simply put, owners declare how much of an object they have, at a date they choose. What I am looking for is reconstruct the history of ownership of the objects with respect to a column like TypeOwner, i.e. each time a new data point is available for a owner, recompute the current ownership. So the output table should look something like

Object      Date         Total  TypeOwner  
  O1        20180101        40      X   
  O2        20180101        50      Y 

  O1        20180203        25      Y
  O2        20180203        20      Y
  O1        20180103        40      X    

  O1        20180206        45      X
  O2        20180206        20      X
  O1        20180206        25      Y
  O2        20180206        20      Y

Basically each time there is a new entry in the table, it should look at all the most recent entries for any other owners and recompute a grouped sum per Object, TypeOwner.

How I obtain the result table:

  • As of 2018-01-01, we know that A owns 10 O1, B owns 50 O2 and C owns 30 O1. So group X (A and C) owns 40 O1 and group Y (B) owns 50 O2. That's the first two lines of the results.
  • On 2018-02-03, we have new data for B. This overrides the past entry for that owner. So as of this date A still owns 10 O1, C still owns 30 O1 and now B owns 25 O1 and 20 O2. So group X (A and C) owns 40 O1, group Y (B) owns 25 O1 and 20 O2.
  • On 2018-02-06, we have new data for A. This overrides the past entry for that owner. As of this date, A now owns 15 O1 and 20 O2, C still owns 30 O1 (2018-01-01), and B still owns 25 O1 and 20 O2 (2018-02-03). Once again group X (A and C) owns 45 O1 and 20 O2, group Y (B) owns 25 O1 and 20 O2

I tried using various techniques like inner join/cross apply to select last records up to some date, but I am not sure what would be te most efficient solution here. The real table is on thousands of objects and owners, hence any speed gain would be great.

Thank you

Best Answer

This will get you the desired result set. It works by 1) creating the missing [object, date, owner] row combinations and then 2) doing a "data smear" to carry the most recent non-NULL amount forward. And 3) filter out any rows that did not have a previous amount to pull from.

IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL 
DROP TABLE #TestData;

CREATE TABLE #TestData (
    Object CHAR(2) NOT NULL,
    [Date] DATE NOT NULL,
    Amount INT NOT NULL,
    Owner CHAR(1) NOT NULL,
    TypeOwner CHAR(1) NOT NULL 
    );
INSERT #TestData (Object, Date, Amount, owner, TypeOwner) VALUES
  ('01', '20180101', 10, 'A', 'X'),      
  ('02', '20180101', 50, 'B', 'Y'),
  ('01', '20180101', 30, 'C', 'X'),
  ('01', '20180203', 25, 'B', 'Y'),
  ('02', '20180203', 20, 'B', 'Y'),
  ('01', '20180206', 15, 'A', 'X'),
  ('02', '20180206', 20, 'A', 'X');


--SELECT * FROM #TestData;

WITH 
    cte_cross_values AS (   -- 
        SELECT 
            o.object,
            d.date,
            ot.owner,
            ot.typeowner
        FROM (
            VALUES ('01'),('02') ) o (object)
            CROSS APPLY ( VALUES ('2018-01-01'),('2018-02-03'),('2018-02-06') ) d (date)
            CROSS APPLY ( VALUES ('A','X'),('c','X'),('B','Y') ) ot (owner, typeowner)
        ),
    cte_rolling_amtount AS (
        SELECT 
            cv.object,
            cv.date,
            RollingAmount = CONVERT(INT, SUBSTRING(MAX(bv.bin_val) OVER (PARTITION BY cv.owner, cv.typeowner, cv.object ORDER BY cv.date), 4, 4)),
            cv.owner,
            cv.typeowner
        FROM
            cte_cross_values cv
            LEFT JOIN #TestData td
                ON cv.date = td.Date
                AND cv.object = td.Object
                AND cv.owner = td.Owner
                AND cv.typeowner = td.TypeOwner
            CROSS APPLY ( VALUES (CONVERT(BINARY(3), td.Date) + CONVERT(BINARY(4), td.Amount)) ) bv (bin_val)
        )

SELECT 
    ra.object,
    ra.date,
    amount = SUM(ra.RollingAmount),
    ra.typeowner
FROM
    cte_rolling_amtount ra
WHERE 
    ra.RollingAmount IS NOT NULL 
GROUP BY 
    ra.object,
    ra.date,
    ra.typeowner
ORDER BY 
    ra.date,
    ra.typeowner,
    ra.object;

Results:

object date             amount      typeowner
------ ----------       ----------- ---------
01     2018-01-01       40          X
02     2018-01-01       50          Y
01     2018-02-03       40          X
01     2018-02-03       25          Y
02     2018-02-03       20          Y
01     2018-02-06       45          X
02     2018-02-06       20          X
01     2018-02-06       25          Y
02     2018-02-06       20          Y