SQL Server 2016 – Excessive Memory Grant Warning on Poor Performing Query

memory-grantperformancequery-performancesql serversql-server-2016t-sql

I have a relatively large database of 550GB on a SQL Server 2016 EE instance which has a max memory limit of 112GB of the total 128GB RAM available to the OS. The database is at the latest compatibility level of 130. Developers have complained of the below query which executes within an acceptable time to them of 30 seconds when executed in isolation, but when they run their processes at scale the same query is executed multiple times concurrently across several threads and this is when they have observed that the execution time suffers and performance/throughput drops. The problematic T-SQL is:

select distinct dg.entityId, et.EntityName, dg.Version
                     from DataGathering dg with(nolock) 
                     inner join entity e with(nolock) 
                           on e.EntityId = dg.EntityId
                     inner join entitytype et with(nolock)   
                         on et.EntityTypeID = e.EntityTypeID  
                         and et.EntityName = 'Account_Third_Party_Details' 
                     inner join entitymapping em with(nolock)   
                         on em.ChildEntityId = dg.EntityId  
                         and em.ParentEntityId = -1  
                     where dg.EntityId = dg.RootId  

    union all

select distinct dg1.EntityId, et.EntityName, dg1.version
                     from datagathering dg1 with(nolock)  
                     inner join entity e with(nolock)   
                         on e.EntityId = dg1.EntityId 
                     inner join entitytype et with(nolock)   
                         on et.EntityTypeID = e.EntityTypeID 
                         and et.EntityName = 'TIN_Details' 
                     where dg1.EntityId = dg1.RootId  
                     and dg1.EntityId not in (  
                         select distinct ChildEntityId   
                         from entitymapping  
                         where ChildEntityId = dg1.EntityId 
                         and ParentEntityId = -1)

The actual execution plan shows the below memory grant warning:

The graphical execution plan can be found here:

https://www.brentozar.com/pastetheplan/?id=r18ZtCidN

Below are the row counts and sizes of the tables touched by this query. The most expensive operator is an index scan of a non-clustered index on the DataGathering table which makes sense given the size of the table compared to the others. I understand why/how the memory grant is required which I believe is due to how the query is written which requires multiple sorts and hash operators. What I need advice/guidance on is how to avoid the memory grants, T-SQL and re-factoring code is not my strong point, is there a way to re-write this query so that it is more performant? If I can tune the query to run faster in isolation then hopefully the benefits would transfer to when it is run at scale which is when the performance starts to suffer. Happy to provide any more information and hoping to learn something from this!

After updating statistics on 3 of the tables:

UPDATE STATISTICS Entity WITH FULLSCAN; 
UPDATE STATISTICS EntityMapping WITH FULLSCAN; 
UPDATE STATISTICS EntityType WITH FULLSCAN;

…the execution plan has improved some:

https://www.brentozar.com/pastetheplan/?id=rkVmdkh_4

Unfortunately, the "Excessive Grant" warning is still there.

Josh Darnell has kindly suggested to re-factor the query to the below in order to avoid parallelism being inhibited which he spotted on a certain operator. The re-factored query throws the error "Msg 4104, Level 16, State 1, Line 7
The multi-part identifier "et.EntityName" could not be bound." How do I work around that?

DECLARE @tinDetailsId int;

SELECT @tinDetailsId = et.EntityTypeID 
FROM entitytype et 
WHERE et.EntityName = 'TIN_Details';

select distinct dg1.EntityId, et.EntityName, dg1.version
                     from datagathering dg1 with(nolock)  
                     inner join entity e with(nolock)   
                         on e.EntityId = dg1.EntityId
                     where dg1.EntityId = dg1.RootId  
                     and e.EntityTypeID = @tinDetailsId
                     and dg1.EntityId not in (  
                         select distinct ChildEntityId   
                         from entitymapping  
                         where ChildEntityId = dg1.EntityId 
                         and ParentEntityId = -1)

            UNION ALL

select distinct dg.entityId, et.EntityName, dg.Version
                     from DataGathering dg with(nolock) 
                     inner join entity e with(nolock) 
                           on e.EntityId = dg.EntityId
                     inner join entitytype et with(nolock)   
                         on et.EntityTypeID = e.EntityTypeID  
                         and et.EntityName = 'Account_Third_Party_Details' 
                     inner join entitymapping em with(nolock)   
                         on em.ChildEntityId = dg.EntityId  
                         and em.ParentEntityId = -1  
                     where dg.EntityId = dg.RootId

Best Answer

This might not help with the memory grant situation (hopefully the additional stats updates will help some with that), but I noticed that parallelism is being inhibited in this query. Check out this part of the plan:

Since there's only one row on the outer side of the nested loops join, all 900k rows are being funneled onto one thread. So despite this query running at DOP 8, this portion of the plan is completely serial. That includes the sort. Here's the XML for that sort:

If at all possible, consider avoiding the join to EntityType, and instead just grabbing that Id and filtering the Entity table with it. This will allow it to just be a predicate on an index scan of the Entity table, hopefully allowing parallelism and speeding up the execution.

Something like this:

DECLARE @tinDetailsId int;

SELECT @tinDetailsId = et.EntityTypeID 
FROM entitytype et 
WHERE et.EntityName = 'TIN_Details';

Which you could then reference in the bottom half of the query, eliminating the join:

select distinct dg1.EntityId, 'TIN_Details', dg1.version
                     from datagathering dg1 with(nolock)  
                     inner join entity e with(nolock)   
                         on e.EntityId = dg1.EntityId
                     where dg1.EntityId = dg1.RootId  
                     and e.EntityTypeID = @tinDetailsId
                     and dg1.EntityId not in (  
                         select distinct ChildEntityId   
                         from entitymapping  
                         where ChildEntityId = dg1.EntityId 
                         and ParentEntityId = -1)

You would want to do the same thing with EntityName "Account_Third_Party_Details" in the top part of the query, as it has the same problem - with twice as many rows.

PS: Totally unrelated to the topic at hand, I noticed that you have nolock hints on all the tables in this query. Make sure that you are aware of the implications of this. Check out this nifty blog posts on the topic:

Bad habits : Putting NOLOCK everywhere by Aaron Bertrand
The Read Uncommitted Isolation Level by Paul White

Related Solutions

Sql-server – TSQL Execution Plan – Estimated Number of Rows = 1 – Poor Performing Query

I know of three main ways of addressing a query performance issue caused by a cardinality mis-estimate:

1. Giving the optimizer more information

The query optimizer generally works better if it has higher quality information to inform the model. Steps here can include updating statistics, creating new statistics, using the RECOMPILE hint to pass along the literal values or variables, or materializing key intermediate result sets to provide better cardinality estimates or indexing.

Rewriting your query to be more clear to the optimizer

This can include simplifying code to remove redundant filters or refactoring it to be more clear to the optimizer. The query looks complex and we don't have the view code so it's hard to say more. There are a few filters in the query that appear to be extremely complex. It wouldn't surprise me at all that the optimizer cannot do a good job with guessing how those filters will affect the results.

3. Taking advantage of SQL Server enhancements

Sometimes there are features that you can turn that will make SQL Server do a better job with your workload. If you aren't using trace flag 4199 you could test this query with it. Trace flag 4199 is a collection of query optimization fixes that Microsoft has done over the years. It is on my default in SQL Server 2016. Trace flag 2301 is a bit less straightforward. It makes some changes to the optimizer around join cardinality estimate and in a rough sense you can say that the optimizer works harder to find a better plan. It is riskier and not nearly as common as trace flag 4199. Might not be practical but worth mentioning that each new version of SQL Server makes changes to improve query performance. In SQL Server 2014 there is a new cardinality estimator model which works better for some workloads.

For your particular query, I also want to note that it's easy to misread the single row estimate that you're seeing. The estimated number of rows that you see on the inner side of the nested loop is the number of rows returned per iteration of the loop. Seeing one row estimated from a nested loop seek is common and often not a sign of a problem.

However, the cardinality estimate for the outer part of the query is a bit off (36269 actual rows versus 6976 rows). It's perfectly natural to see a high number of logical reads with a nested loop and to suspect that part of the query is slow and needs to be improved. I find it useful to try to think about what the query optimizer should do instead to get the data that it needs. Would a hash join be better? Merge join? A nested loop with a different index?

I don't have the full picture but the nested loop joins that you called out don't look that bad to me. I don't see any key lookups and one of the indexes is covering. One way to move forward is to materialize all of the results of the query up until that point. Gather statistics on the temp table. Then look at a query plan for the adjusted query and see how long it takes to run. If the query plan changes for the better then you have a useful clue on how to make it run faster. If it doesn't change then you can at least get a more precise measurement of what you think the slow part is. Good luck!

SQL Server Query Tuning with Temp Table Join – Best Practices

Why does the inner join to a one record temp table make the query take so much longer time?

Without the join, the optimizer is smart enough to work out that it can find the minimum value by reading one row from the end of the index.

Unfortunately, it is not currently equipped to apply the same sort of logic when the query is more complicated (with a join or grouping clause, for example). To work around this limitation, you can rewrite the query to compute local minimums per row in the temporary table, then find the global minimum.

Perhaps the easiest way to express this in T-SQL is to use the APPLY operator:

SELECT
    -- Global minimum
    @tenor_from = MIN(MinMaturityPerCurveID.maturity_date)
FROM #source_price_curve_list AS SPCL
CROSS APPLY
(
    -- Minimum maturity_date per price_curve_id
    SELECT TOP (1) 
        SPC.maturity_date
    FROM  dbo.source_price_curve AS SPC
    WHERE
        SPC.source_curve_def_id = SPCL.price_curve_id
         and as_of_date >= @as_of_date_from 
    ORDER BY
        SPC.maturity_date ASC
) AS MinMaturityPerCurveID;

Good performance relies on there being many rows per price_curve_id. You may need an index of the form:

CREATE NONCLUSTERED INDEX
    [IX dbo.source_price_curve source_curve_def_id, maturity_date, as_of_date]
ON dbo.source_price_curve 
(
    source_curve_def_id,
    maturity_date,
    as_of_date
);

Best Answer

Related Solutions

Sql-server – TSQL Execution Plan – Estimated Number of Rows = 1 – Poor Performing Query

SQL Server Query Tuning with Temp Table Join – Best Practices

Related Question