I have an aggregate query that has a lot of columns and generate a huge data set:
SELECT column1,
column2,
...
column20,
sum(column21),
...
sum(column40)
INTO #Output
FROM #Ledger
GROUP BY column1,
...
column20
The input table (#Ledger
) has 18m rows and the result table (#Output
) has 600k rows. The query took 9 minutes. Is there anything I can do to make it faster? Here's the execution plan:
There's no index on the #Ledger
table and the exclamation point in Sort gives the following warning:
Operator used tempdb to spill data during execution with spill level 1 and 4 spilled thread(s), Sort wrote 1541592 pages to and read 1541592 pages from tempdb with granted memory 3752160KB and used memory 3681824KB
Best Answer
Looking at your query plan, it seems that the optimizer has chosen a Stream Aggregate operator to handle your
GROUP BY
. The downside of these is that they require ordered data, which a HEAP is not.David's suggestion is perfectly valid. Adding a clustered index may remove the Sort operator, so performance testing that vs. my suggestion is up to you.
If indexing the temp table doesn't give you the performance you need, applying an
OPTION(HASH GROUP);
hint to the query will influence the optimizer to use a Hash Match Aggregate rather than a Stream Aggregate.Hash Match Aggregates don't require sorted data, but may also spill to disk the way the Sort spilled in the Stream Aggregate plan.