SQL Server Query Performance – Efficient Ways to Run First/Max Transaction Query

performancequery-performancesql serversql-server-2008-r2

For each account, I need the first transaction date, the largest transaction amount, and the time taken to get from first to largest. Data is truncated and reloaded frequently, and I have read-only access (no schema or index changes). The table has well over a million rows, so creating and updating a temp table is way too slow; cross apply seems inefficient as well. With a CTE and window function I only have to hit the table twice. But is there a better approach? Simplified example of the table and my query follow:

CREATE TABLE [dbo].[trans](
    [Tran_ID] [int] IDENTITY(1,1) NOT NULL,
    [Account_ID] [varchar](50) NULL,
    [Tran_Date] [datetime] NOT NULL,
    [Tran_Amount] [money] NOT NULL
) ON [PRIMARY];

EDIT: @Peter the table has the below clustered index. This is the only index, there are no primary or foreign keys or other constraints.

CREATE CLUSTERED INDEX [idx_trans_ID_DATE] ON [dbo].[trans]
(
    [Account_ID] ASC,
    [Tran_Date] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO


WITH HighestTran AS (
            SELECT 
                Account_ID,
                Tran_Date,
                Tran_Amount,
                ROW_NUMBER() OVER (PARTITION BY Account_ID ORDER BY Tran_Date ASC) AS FirstTranRow,
                ROW_NUMBER() OVER (PARTITION BY Account_ID ORDER BY Tran_Amount DESC) AS MaxTranRow
            FROM 
                dbo.trans
            WHERE 
                Tran_Amount > 0 
            )

SELECT 
    t1.Account_ID,
    t1.Tran_Date,
    t2.Tran_Date,
    t1.Tran_Amount,
    t2.Tran_Amount
FROM
    HighestTran t1
    INNER JOIN HighestTran t2 ON t1.Account_ID = t2.Account_ID AND t1.FirstTranRow = 1 and t2.MaxTranRow = 1
ORDER BY t1.Account_ID;

Best Answer

A slight variation on your's and Thomas's that will do it with a single index scan.

WITH FlagFirstMax AS 
    (
    SELECT
        Account_ID,
        Tran_Date,
        Tran_Amount,
        ROW_NUMBER() OVER (PARTITION BY Account_ID ORDER BY Tran_Date ASC) isFirst,
        ROW_NUMBER() OVER (PARTITION BY Account_ID ORDER BY Tran_Amount DESC, Tran_Date ASC) isMax
    FROM dbo.trans
    WHERE Tran_Amount > 0
    )
SELECT Account_ID,
    MAX(CASE WHEN isFirst = 1 THEN Tran_Date END) First_Tran_Date,
    DATEDIFF(
        day,
        MAX(CASE WHEN isFirst = 1 THEN Tran_Date END),
        MAX(CASE WHEN isMax = 1 THEN Tran_Date END)
        ) Day_Duration,
    MAX(CASE WHEN isMax = 1 THEN Tran_Amount END) Max_Tran_Amount
FROM FlagFirstMax
GROUP BY Account_ID;