Sql-server – Query memory grant and tempdb spill

execution-plansortingsql serversql-server-2012tempdb

I have a long running query (fact table with 100 million rows joining a number of small dim tables then group by) that's spilling to tempdb, even though (after some tweaking) the CE is very close to the actual number of rows, see plan:

enter image description here

Looking for an explanation, I noticed the following memory grant info:

enter image description here

Environment: SQL Server 2012 SP1 Enterprise, server RAM 256 GB, SQL Server max memory 200 GB, buffer pool size 42 GB, workspace max size 156 GB (GrantedMemory = 156 * 25% ~= 38 GB)

Questions

  1. does that mean no matter how good the CE is, the query has no chance of not spilling over? since the query max ram is hard capped at 38 GB
  2. does query optimizer not take max query ram into consideration when building the plan? (forcing a Hash Match aggregate would eliminate the sort step and significantly improve query performance, unfortunately, the actual query is coming from Cognos and we have no control of it)
  3. will increase the 25% cap to close to 100% be a sensible option here? (assuming that the said server access can be controlled to limit the number of concurrent query requests)

Anonymized query plan at Paste The Plan

When forcing a hash match aggregate (instead of a sort + stream aggregate), the query consistently finishes 3 – 4 times quicker. Unfortunately, the actual query is coming from Cognos, and we have no way of changing it.

There is no hash spill in the hash aggregate plan. The query optimizer won't pick hash match aggregate because if I look at the operator cost for hash vs stream aggregate, the CPU cost of hash group is 2 – 3 times higher than doing stream aggregate.

In both stream and hash aggregate, the estimated output rows is exactly the same as the input (~100 million rows).

The query uses a single NC columnstore index, and column stats are all regularly updated.

Best Answer

  1. does that mean no matter how good the CE is, the query has no chance of not spilling over? since the query max ram is hard capped at 38 GB

The overall memory grant for your query appears capped at 37GB given your current hardware and SQL Server configuration.

If the Sort cannot be performed within the Memory Fraction (0.860743 in that plan) of the query memory grant, it will spill to tempdb. Note also that this parallel Sort splits its fraction of the query memory grant equally across 12 threads, and this allocation cannot be rebalanced at runtime.

  1. does query optimizer not take max query ram into consideration when building the plan? (forcing a Hash Match aggregate would eliminate the sort step and significantly improve query performance, unfortunately, the actual query is coming from Cognos and we have no control of it)

Yes, it does, but only as an input to the general costing framework. The optimizer chooses the plan that looks cheapest according to its model. If the numbers are wrong, the plan choice is not likely to be optimal.

In your case, the actual number of rows produced by the Stream Aggregate is significantly less than estimated:

Stream Aggregate output

The optimizer favours Hash Aggregate when fewer, larger groups are expected (since each group occupies a slot in the hash table). The misinformation about the density leads to an incorrect choice of Sort + Stream Aggregate.

The best plan would likely be a hash join instead of the nested loops join, and a hash aggregate. This should be able to extend batch mode processing to the important aggregation step.

SQL Server 2012 was quite limited in its transitions between row and batch mode. The execution engine never returns to batch mode once row mode processing has begun (so row-batch-row is ok, but batch-row-batch is not).

  1. will increase the 25% cap to close to 100% be a sensible option here? (assuming that the said server access can be controlled to limit the number of concurrent query requests)

If you want to increase the amount of memory available for this query, you could certainly do so by changing your Resource Governor setup. Increase the limit by degrees to see if you can locate a good compromise. I would be wary of going too close to 100%.

If the query is suitable for a plan guide, try a HASH GROUP hint.

Longer term, upgrading to SQL Server 2016 will pay dividends since more operators can execute in batch mode (including Sort), dynamic memory grant increases are possible, and ... about a thousand other improvements in columnstore/batch mode processing in general.