How to Improve SUM() Performance in SQL Server

sql serversql-server-2012

I have an aggregate query that has a lot of columns and generate a huge data set:

SELECT      column1,
            column2,
            ...
            column20,
            sum(column21),
            ...
            sum(column40)
INTO        #Output
FROM        #Ledger
GROUP BY    column1,
            ...
            column20

The input table (#Ledger) has 18m rows and the result table (#Output) has 600k rows. The query took 9 minutes. Is there anything I can do to make it faster? Here's the execution plan:

execution plan

There's no index on the #Ledger table and the exclamation point in Sort gives the following warning:

Operator used tempdb to spill data during execution with spill level 1 and 4 spilled thread(s), Sort wrote 1541592 pages to and read 1541592 pages from tempdb with granted memory 3752160KB and used memory 3681824KB

Best Answer

Looking at your query plan, it seems that the optimizer has chosen a Stream Aggregate operator to handle your GROUP BY. The downside of these is that they require ordered data, which a HEAP is not.

David's suggestion is perfectly valid. Adding a clustered index may remove the Sort operator, so performance testing that vs. my suggestion is up to you.

If indexing the temp table doesn't give you the performance you need, applying an OPTION(HASH GROUP); hint to the query will influence the optimizer to use a Hash Match Aggregate rather than a Stream Aggregate.

Hash Match Aggregates don't require sorted data, but may also spill to disk the way the Sort spilled in the Stream Aggregate plan.