Sql-server – How to manipulate the query plan so that I can have more control on memory grants

execution-planoptimizationplan-guidessql server 2014statistics

Let's say I have a complex query and for this query there are huge
Differences between the estimated and actual execution plans.

These differences are causing spillage into tempDb as described here.

The problem is that in this query there are tables that are too big for me to update the statistics.

How can I manipulate the query plan so that I can have more control on memory grants?

I want to avoid memory spillage to tempdb.

In order for this question to be objective and not too broad, we could concentrate for instance on the hash join operator, when can I replace it by a merge join operator?

Are there other operators with memory grant that could be replaced depending on the work load?

Understanding SQL server memory grant

-- Search cache for queries with memory grants:
SELECT t.text, cp.objtype,qp.query_plan
FROM sys.dm_exec_cached_plans AS cp
JOIN sys.dm_exec_query_stats AS qs ON cp.plan_handle = qs.plan_handle
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS t
WHERE qp.query_plan.exist('declare namespace n=''http://schemas.microsoft.com/sqlserver/2004/07/showplan''; //n:MemoryFractions') = 1

Best Answer

I suggest thinking about your problem in a different way. It's not true that there's some inherit advantage to a hash join that makes the SQL Server query optimizer want to pick it (except in batch mode with CCIs). The query optimizer chose a hash join because it thought it was a better option than a merge or nested loop join. Nested loop joins are a good choice when the outer table is small and the inner table is indexed. Merge joins are a good choice when the inputs are sorted and you need to access a non-small percentage of rows from the tables. Hash joins are a good choice when you don't have a better choice than scanning both tables. In most cases a hash join when it is not appropriate indicates missing indexes.

If you have a legitimate need to encourage a merge join there are a few things that you can do. If it is a 1:1 join define constraints so that SQL Server knows that. The query optimizer costs many-to-many merge joins much higher than 1:1 joins. Create indexes so that your data is presorted. If you cannot do that then make sure the sorts are in a parallel zone of the plan. Sorts benefit greatly from parallelism. Note that a merge join can lead to tempdb usage if there are large sorts or if the merge join is many-to-many. There may be other scenarios as well.

Nested loop joins aren't necessary a bad choice for a large, complicated plan. If you have a legitimate need to encourage a nested loop join make sure you have indexes that enable that. Your query may benefit from a parallel apply type of plan. Those plans can use less memory than a plan with large parallel hash joins. You should be aware that in some cases nested loop joins will do a sort on the outer table before the join operator to improve performance.

On the subject of memory, you could make sure that your server has an appropriate amount of memory and that other queries on the server aren't using too much memory. Otherwise, trying to pick query plan operators on the basis on memory seems kind of backwards. You should really fix your statistics. Use table partitioning, a partitioned view, a smaller sample size, filtered indexes, etc. You're playing a game that's very difficult to win if you don't fix your cardinality estimates.