Sql-server – SQL Server – Multiple running totals

performancequeryscalabilitysql server

I've got a base table with transactions and I need to create a table with running totals. I need them to be per account and and also have a few running totals for each account (depending on the transaction type), and inside that, some running totals per sub-account.

My base table has these fields (more or less) :

AccountID  |  SubAccountID   |  TransactionType  |  TransactionAmount

Considering I've got about 4 types of running totals per Account/TransactionType and 2 more running totals per Account/SubAccount/TransactionType , and I've got about 2M accounts with about 10 sub accounts each, and I'm getting about 10K transactions every minute (at maximum load), how would you do it?

It's also a must that this run asynchronously via an SQL job, creating the aggregations without being part of the transactions themselves.

I'm pretty stuck using a cursor here – which takes way too long. I'd really appreciate any advice / articles that are doing more or less the same.

Best Answer

Asynchronous implies that the running totals do not need to be completely accurate at all times, or your data change patterns are such that a one-off running total build will be valid and accurate until the next load. Anyway, I'm sure you have thought that part through, so I won't labour the point.

Your main options for a high-performance, supported, method are a SQLCLR function/procedure, or an UPDATE based on Hugo Kornelis' set-based iteration method. The SQLCLR method (implemented in a procedure, but reasonably easy to translate) can be found here.

I haven't been able to find Hugo's method online, but it is detailed in the excellent MVP Deep Dives (Volume 1). Sample code to illustrate Hugo's method (copied from one of my posts on another site you may not have a login for) is shown below:

-- A work table to hold the reformatted data, and
-- ultimately, the results
CREATE  TABLE #Work
    (
    Acct_No         VARCHAR(20) NOT NULL,
    MonthDate       DATETIME NOT NULL,
    MonthRate       DECIMAL(19,12) NOT NULL,
    Amount          DECIMAL(19,12) NOT NULL,
    InterestAmount  DECIMAL(19,12) NOT NULL,
    RunningTotal    DECIMAL(19,12) NOT NULL,
    RowRank         BIGINT NOT NULL
    );

-- Prepare the set-based iteration method
WITH    Accounts
AS      (
        -- Get a list of the account numbers
        SELECT  DISTINCT Acct_No 
        FROM    #Refunds
        ),
        Rates
AS      (
        -- Apply all the accounts to all the rates
        SELECT  A.Acct_No,
                R.[Year],
                R.[Month],
                MonthRate = R.InterestRate / 12
        FROM    #InterestRates R
        CROSS 
        JOIN    Accounts A
        ),
        BaseData
AS      (
        -- The basic data we need to work with
        SELECT  Acct_No = ISNULL(R.Acct_No,''),
                MonthDate = ISNULL(DATEADD(MONTH, R.[Month], DATEADD(YEAR, R.[year] - 1900, 0)), 0),
                R.MonthRate,
                Amount = ISNULL(RF.Amount,0),
                InterestAmount = ISNULL(RF.Amount,0) * R.MonthRate,
                RunningTotal = ISNULL(RF.Amount,0)
        FROM    Rates R
        LEFT
        JOIN    #Refunds RF
                ON  RF.Acct_No = R.Acct_No
                AND RF.[Year] = R.[Year]
                AND RF.[Month] = R.[Month]
        )
-- Basic data plus a rank id, numbering the rows by MonthDate, and resetting to 1 for each new Account
INSERT  #Work
        (Acct_No, MonthDate, MonthRate, Amount, InterestAmount, RunningTotal, RowRank)
SELECT  BD.Acct_No, BD.MonthDate, BD.MonthRate, BD.Amount, BD.InterestAmount, BD.RunningTotal,
        RowRank = RANK() OVER (PARTITION BY BD.Acct_No ORDER BY MonthDate)
FROM    BaseData BD;

-- An index to speed the next stage (different from that used with the Quirky Update method)
CREATE UNIQUE CLUSTERED INDEX nc1 ON #Work (RowRank, Acct_No);

-- Iteration variables
DECLARE @Rank       BIGINT,
        @RowCount   INTEGER;

-- Initialize
SELECT  @Rank = 1,
        @RowCount = 1;

-- This is the iteration bit, processes a rank id per iteration
-- The number of rows processed with each iteration is equal to the number of groups in the data
-- More groups --> greater efficiency
WHILE   (1 = 1)
BEGIN
        SET @Rank = @Rank + 1;

        -- Set-based update with running totals for the current rank id
        UPDATE  This
        SET     InterestAmount = (Previous.RunningTotal + This.Amount) * This.MonthRate,
                RunningTotal = Previous.RunningTotal + This.Amount + (Previous.RunningTotal + This.Amount) * This.MonthRate
        FROM    #Work This
        JOIN    #Work Previous
                ON  Previous.Acct_No = This.Acct_No
                AND Previous.RowRank = @Rank - 1
        WHERE   This.RowRank = @Rank;

        IF  (@@ROWCOUNT = 0) BREAK;
END;

-- Show the results in natural order
SELECT  *
FROM    #Work
ORDER   BY
        Acct_No, RowRank;

In SQL Server 2012, you could use the windowing function extensions e.g. SUM OVER (ORDER BY).

Related Solutions

Sql-server – SQL Select taking too much time to execute

I think in a million records query, you have to avoid things like OUTER JOINS. I suggest you use UNION ALL Instead of LEFT JOIN. As long as I think CROSS APPLY is more efficient than sub-query in the select clause I will modify the query written by Conard Frix, which I think is correct.

now: when I started to modify your query I noticed that you have a WHERE clause saying: JoinedTable.WhereColumn IN (1, 3). in this case, if the field is null the condition will become false. then why are you using LEFT JOIN while you are filtering null valued rows? just replace LEFT JOIN With INNER JOIN, I guarantee that it will become faster.

about INDEX:

please note that when you have an index on a table, say

table1(a int, b nvarchar)

and your index is :

nonclustered index ix1 on table1(a)

and you want to do something like this:

select a,b from table1
where a < 10

in your index you have not included the column b so what happens?

if sql-server uses your index, it will have to search in the index, called "Index Seek" and then refer to main table to get column b, called "Look Up". This procedure might take much longer than scanning the table itself: "Table Scan".

but based on the statistics that sql-server has, in such situations, it might not use your index at all.

so first of all check the Execution Plan to see if the index is used at all.

if yes or no both, alter your index to include all columns that you are selecting. say like:

nonclustered index ix1 on table1(a) include(b)

in this case Look Up will not be needed, and your query will execute so much faster.

Sql-server – Which of these table designs is better for performance

A billion records a year isn't much.

With partitioning (per Costtype maybe) and archiving it is manageable.

The number of data items to store is still 200k * 13 * N. As columns, you'll get less rows per page and it will take more space than as rows. You may gain if "CostType1" is not a fixed length datatype, but it's marginal.

"KISS" as they say

Best Answer

Related Solutions

Sql-server – SQL Select taking too much time to execute

Sql-server – Which of these table designs is better for performance

Related Question