Sql-server – SQL Server – Multiple running totals

performancequeryscalabilitysql server

I've got a base table with transactions and I need to create a table with running totals. I need them to be per account and and also have a few running totals for each account (depending on the transaction type), and inside that, some running totals per sub-account.

My base table has these fields (more or less) :

AccountID  |  SubAccountID   |  TransactionType  |  TransactionAmount

Considering I've got about 4 types of running totals per Account/TransactionType and 2 more running totals per Account/SubAccount/TransactionType , and I've got about 2M accounts with about 10 sub accounts each, and I'm getting about 10K transactions every minute (at maximum load), how would you do it?

It's also a must that this run asynchronously via an SQL job, creating the aggregations without being part of the transactions themselves.

I'm pretty stuck using a cursor here – which takes way too long. I'd really appreciate any advice / articles that are doing more or less the same.

Best Answer

Asynchronous implies that the running totals do not need to be completely accurate at all times, or your data change patterns are such that a one-off running total build will be valid and accurate until the next load. Anyway, I'm sure you have thought that part through, so I won't labour the point.

Your main options for a high-performance, supported, method are a SQLCLR function/procedure, or an UPDATE based on Hugo Kornelis' set-based iteration method. The SQLCLR method (implemented in a procedure, but reasonably easy to translate) can be found here.

I haven't been able to find Hugo's method online, but it is detailed in the excellent MVP Deep Dives (Volume 1). Sample code to illustrate Hugo's method (copied from one of my posts on another site you may not have a login for) is shown below:

-- A work table to hold the reformatted data, and
-- ultimately, the results
CREATE  TABLE #Work
    (
    Acct_No         VARCHAR(20) NOT NULL,
    MonthDate       DATETIME NOT NULL,
    MonthRate       DECIMAL(19,12) NOT NULL,
    Amount          DECIMAL(19,12) NOT NULL,
    InterestAmount  DECIMAL(19,12) NOT NULL,
    RunningTotal    DECIMAL(19,12) NOT NULL,
    RowRank         BIGINT NOT NULL
    );

-- Prepare the set-based iteration method
WITH    Accounts
AS      (
        -- Get a list of the account numbers
        SELECT  DISTINCT Acct_No 
        FROM    #Refunds
        ),
        Rates
AS      (
        -- Apply all the accounts to all the rates
        SELECT  A.Acct_No,
                R.[Year],
                R.[Month],
                MonthRate = R.InterestRate / 12
        FROM    #InterestRates R
        CROSS 
        JOIN    Accounts A
        ),
        BaseData
AS      (
        -- The basic data we need to work with
        SELECT  Acct_No = ISNULL(R.Acct_No,''),
                MonthDate = ISNULL(DATEADD(MONTH, R.[Month], DATEADD(YEAR, R.[year] - 1900, 0)), 0),
                R.MonthRate,
                Amount = ISNULL(RF.Amount,0),
                InterestAmount = ISNULL(RF.Amount,0) * R.MonthRate,
                RunningTotal = ISNULL(RF.Amount,0)
        FROM    Rates R
        LEFT
        JOIN    #Refunds RF
                ON  RF.Acct_No = R.Acct_No
                AND RF.[Year] = R.[Year]
                AND RF.[Month] = R.[Month]
        )
-- Basic data plus a rank id, numbering the rows by MonthDate, and resetting to 1 for each new Account
INSERT  #Work
        (Acct_No, MonthDate, MonthRate, Amount, InterestAmount, RunningTotal, RowRank)
SELECT  BD.Acct_No, BD.MonthDate, BD.MonthRate, BD.Amount, BD.InterestAmount, BD.RunningTotal,
        RowRank = RANK() OVER (PARTITION BY BD.Acct_No ORDER BY MonthDate)
FROM    BaseData BD;

-- An index to speed the next stage (different from that used with the Quirky Update method)
CREATE UNIQUE CLUSTERED INDEX nc1 ON #Work (RowRank, Acct_No);

-- Iteration variables
DECLARE @Rank       BIGINT,
        @RowCount   INTEGER;

-- Initialize
SELECT  @Rank = 1,
        @RowCount = 1;

-- This is the iteration bit, processes a rank id per iteration
-- The number of rows processed with each iteration is equal to the number of groups in the data
-- More groups --> greater efficiency
WHILE   (1 = 1)
BEGIN
        SET @Rank = @Rank + 1;

        -- Set-based update with running totals for the current rank id
        UPDATE  This
        SET     InterestAmount = (Previous.RunningTotal + This.Amount) * This.MonthRate,
                RunningTotal = Previous.RunningTotal + This.Amount + (Previous.RunningTotal + This.Amount) * This.MonthRate
        FROM    #Work This
        JOIN    #Work Previous
                ON  Previous.Acct_No = This.Acct_No
                AND Previous.RowRank = @Rank - 1
        WHERE   This.RowRank = @Rank;

        IF  (@@ROWCOUNT = 0) BREAK;
END;

-- Show the results in natural order
SELECT  *
FROM    #Work
ORDER   BY
        Acct_No, RowRank;

In SQL Server 2012, you could use the windowing function extensions e.g. SUM OVER (ORDER BY).