I have two tables in Sql Server 2008 that look something like this:
Transaction_Details:
Dataset,
Department,
Charge_ID,
Transaction_Type,
dos_month,
post_month,
amount
Charge_Summary
Dataset,
Department,
Charge_ID,
dos_month,
Ins1_Category
The Output I'm looking for would look something like this:
Dataset,
Department,
Ins1_Category,
dos_month,
Post_Month,
Total_Payments,
Total_Charges,
Previous_Balance — Defined as total amount for all matching transactions with a post_month previous to the current month
The SQL I'm using below gives the expected output, but there is one index seek in the execution plan taking up over 90% of the cost. This is in a datawarehouse operation, so there is no other activity on the server aside from this query, no other reads or writes to worry about. The execution time of ~30 minutes is acceptable for what it's being used for, but I'd still like some advice if there's a better way to do this.
Link to execution plan on pastebin
SELECT
t1.Dataset,
t1.Department,
c1.Ins1_Category,
t1.dos_month,
t1.post_month,
SUM(case
when t1.transaction_type = 'payment' THEN t1.amount
ELSE 0
END
) AS total_payments,
SUM(case
when t1.transaction_type = 'adjustment' THEN t1.amount
ELSE 0
END
) AS total_adjustments,
SUM(case
when t1.transaction_type = 'charge' THEN t1.amount
ELSE 0
END
) AS total_charges,
(
SELECT
SUM(t2.amount)
FROM
Transaction_Details t2
LEFT JOIN
Charge_Summary c2
ON
t2.Dataset = c2.Dataset AND
t2.Charge_ID = c2.Charge_ID
WHERE
t1.Dataset = t2.Dataset AND
t1.Department = t2.Department AND
t1.dos_month = t2.dos_month AND
t1.post_month > t2.post_month AND
t2.Charge_ID IS NOT NULL AND
c2.Ins1_Category = c1.Ins1_Category
) as previous_balance
FROM
Transaction_Details t1
LEFT JOIN
Charge_Summary c1
ON
t1.Dataset = c1.Dataset AND
t1.Charge_ID = c1.Charge_ID
WHERE
t1.Charge_ID is not null
GROUP BY
t1.Dataset,
t1.Department,
c1.Ins1_Category,
t1.dos_month,
t1.post_month
Best Answer
In addition to the T-SQL techniques discussed in Aaron Bertrands article, "Best approaches for grouped running totals", you might want to look at a
SQLCLR
procedure (if you are running less than SQL Server 2012).The main advantages are that a
SQLCLR
procedure requires only a single scan of the source records, and benefits from the increased execution speed of compiled code. The sample below is taken from Aaron's post:DDL
Sample data
SQL CLR procedure DDL
Usage
Output
Execution plan
SQL CLR source code