SQL Server CTE – Optimization Techniques

cteperformancequery-performancesql serversql-server-2008-r2

I want to take aggregate data for each sales person employed and display the Salesname, saleamount, and saleshippingamount. I have the query set to use 2 UPDATE CTE statements to update a table that house the master data for each employee. I am updating because I need aggregate totals to be displayed.

The syntax takes roughly 10 minutes to complete and between the two tables there are roughly 60,000 rows to process. Is a CTE The wrong way to go about this?

Below is sample DDL. The employee names are populated from the HR table, but I did not show DDL for this, just an insert statement.

Create Table #FinalData
(
    employee varchar(400)
    ,tr float
    ,tf float
)
Create Table #TR
(
    employee varchar(400)
    ,saleamount float
    ,saledate date
)
Create Table #TF
(
    employee varchar(400)
    ,saleshippingamt float
    ,saledate date
)
INSERT INTO #TR (employee, saleamount, saledate) VALUES
('Employee 1', '12.63', '2017-01-01'), ('Employee 1', '15.00' ,'2017-01-02')
,('Employee 2', '14.00', '2017-01-03'), ('Employee 2' ,'12.00', '2017-01-03')
,('Employee 3', '16.00', '2017-01-03'), ('Employee 3', '13.00', '2017-01-04')
INSERT INTO #TF (employee, saleshippingamt, saledate) VALUES
('Employee 1', '5.00', '2017-01-01'), ('Employee 1', '6.00' ,'2017-01-02')
,('Employee 2', '5.50', '2017-01-03'), ('Employee 2' ,'5.00', '2017-01-03')
,('Employee 3', '6.00', '2017-01-03'), ('Employee 3', '6.00', '2017-01-04')
INSERT INTO #FINALDATA (employee) Values
('Employee 1'), ('Employee 2'), ('Employee 3')

;With TR As 
(
    SELECT 
    employee
    ,SUM(saleamount) Totals
    FROM #TR
    WHERE saledate BETWEEN '2017-01-01' AND '2017-01-25'
    GROUP BY employee
)
UPDATE t
SET tr = t2.TotalCount
FROM #FinalData t
INNER JOIN (Select employee, SUM(Totals) TotalCount
            FROM TR
            GROUP BY employee) As t2
ON t.employee = t2.employee

;With TF As 
(
    SELECT 
    employee
    ,SUM(saleshippingamt) Totals
    FROM #TF
    WHERE saledate BETWEEN '2017-01-01' AND '2017-01-25'
    GROUP BY employee
)
UPDATE t
SET tf = t2.TotalCount
FROM #FinalData t
INNER JOIN (Select employee, SUM(Totals) TotalCount
            FROM TF
            GROUP BY employee) As t2
ON t.employee = t2.employee

Select * FROM #FinalData

And this is my execution plan.

What is an option that I have to optimize this query?

Best Answer

I've finally been able to do some testing and playing with the query and can give you more concrete suggestions. I used AdventureWorks for my sample so I had some actual data to work with.

Option 1 - Add Indexes

This will give you the best performance increase for the least amount of effort. I saw an increase of 40% in query execution in my sample set just by adding three indexes.

Tables #TR and #TF. Add an index on the saledate, employee, and include the amount.

CREATE INDEX IDXNC_TFSaleDateEmployee ON #TF (saledate, employee) INCLUDE (saleshippingamt)

CREATE INDEX IDXNC_TRSaleDateEmployee ON #TR (saledate, employee) INCLUDE (saleamount)

for the #FinalData, you should have a clustered index on the employee field.

CREATE CLUSTERED INDEX IDXC_FinalDataEmployee ON #FinalData (employee)

Option 2 - Change your query

Your query could use some tweaking, it may even allow you to get rid of the temp table all together.

This query also performs a little faster, but the indexes will give you the best increase in performance.

;WITH CTE_Employee AS
    (
    SELECT DISTINCT employee
    FROM (
        SELECT employee
        FROM #TR
        UNION ALL
        SELECT employee
        FROM #TF
        ) AS D
    )
    , CTE_TR AS
    (
    SELECT employee
        , SUM(saleamount) AS TotalDue
    FROM #TR
    WHERE saledate BETWEEN '2017-01-01' AND '2017-01-25'
    GROUP BY employee
    )
    , CTE_TF AS
    (
    SELECT employee
        , SUM(saleamount) AS TotalDue
    FROM #TR
    WHERE saledate BETWEEN '2017-01-01' AND '2017-01-25'
    GROUP BY employee
    )

SELECT S.employee
    , TR.TotalDue
    , TF.TotalDue
FROM CTE_Employee AS S
    LEFT OUTER JOIN CTE_TR AS TR ON TR.employee = S.employee
    LEFT OUTER JOIN CTE_TF AS TF ON TF.employee = S.employee

General Comments

You can name CTEs however you like, but it does confuse things a little bit if you give them the same name as the table they represent. I like to give them a CTE_* prefix, but it's optional.

You can also chain CTEs as I've done here.

In your original query, you were doing two aggregate operations when you were getting the totals for the totalcount in the #FinalData. That would also be something to keep an eye out for, you can see that it's unnecessary, but SQL will do the sort and sum operation multiple times. You could just join the CTE table on instead of doing a sub-select.

You Had

UPDATE t
SET tf = t2.TotalCount
FROM #FinalData t
INNER JOIN (Select employee, SUM(Totals) TotalCount
            FROM TF
            GROUP BY employee) As t2
ON t.employee = t2.employee

Should be

UPDATE t
SET tf = t2.TotalCount
FROM #FinalData t
INNER JOIN TF As t2 ON t.employee = t2.employee