Sql-server – Nested loops estimate way too low, causes tempdb spills

optimizationperformancequery-performancesql serversql-server-2016

I'm working on tuning a query, and I think the main problem right now is that there is a Nested Loops step early on with a row estimate that is way too low, causing too little memory to be allocated and downstream steps to spill to tempdb. The troublesome step is the one in the green box below.

https://www.brentozar.com/pastetheplan/?id=Sk8E-6YAM

The two inputs into the Nested Loops both have accurate row estimates of 2.4M rows, but the output of that step is estimated at only 37 rows, while the actual is 2.4M.

All three Table Spool steps below have the exact same estimated vs actual as well, which makes me think they're getting their estimate from the Nested Loops. Every branch spills to tempdb in a Sort step.

I'm thinking if I can just get the estimated rows out of the Nested Loops corrected, it will not only prevent spills on the top branch, but that the Table Spools will also inherit the correct estimate, get a sufficient memory grant, and also not spill.

SQL Server 2016 SP2 with the legacy CE on.

Here's the query

SELECT  Object18.Column1,
        Object18.Column3,
        Function1(Object19.Column12) AS Column13,
        Function2(Object19.Column12) AS Column14,
        Function3(DISTINCT (CASE WHEN Object19.Column15 = ? THEN Object18.Column6 END)) AS Column16,
        Function3(DISTINCT (CASE WHEN Object19.Column17 = ? THEN Object18.Column6 END)) AS Column18,
        Function3(DISTINCT Object18.Column6) AS Column19
from Object8 Object18
join Object20 Object19 on Object19.Column20 = Object18.Column7 and Object19.Column3 = Object18.Column3
where Object19.Column8 = ?
GROUP BY Object18.Column1, Object18.Column3
option (recompile)

Best Answer

I was able to get the estimates corrected by creating an index on the temp table after inserting. This solved the tempdb spills, but performance didn't improve much. Ultimately, a combination of separating the data query from the aggregation, and shuffling some other queries around to get maximum benefit of parallelization, combined to reduce the speed by 4x. Thanks everyone, especially Joe Obbish

Related Solutions

Sql-server – How (and why) does TOP impact an execution plan

I would have guessed that when a query includes TOP n the database engine would run the query ignoring the the TOP clause, and then at the end just shrink that result set down to the n number of rows that was requested. The graphical execution plan seems to indicate this is the case -- TOP is the "last" step. But it appears there is more going on.

The way the above is phrased makes me think you may have an incorrect mental picture of how a query executes. An operator in a query plan is not a step (where the full result set of a previous step is evaluated by the next one.

SQL Server uses a pipelined execution model, where each operator exposes methods like Init(), GetRow(), and Close(). As the GetRow() name suggests, an operator produces one row at a time on demand (as required by its parent operator). This is documented in the Books Online Logical and Physical Operators reference, with more detail in my blog post Why Query Plans Run Backwards. This row-at-a-time model is essential in forming a sound intuition for query execution.

My question is, how (and why) does a TOP n clause impact the execution plan of a query?

Some logical operations like TOP, semi joins and the FAST n query hint affect the way the query optimizer costs execution plan alternatives. The basic idea is that one possible plan shape might return the first n rows more quickly than a different plan that was optimized to return all rows.

For example, indexed nested loops join is often the fastest way to return a small number of rows, though hash or merge join with scans might be more efficient on larger sets. The way the query optimizer reasons about these choices is by setting a Row Goal at a particular point in the logical tree of operations.

A row goal modifies the way query plan alternatives are costed. The essence of it is that the optimizer starts by costing each operator as if the full result set were required, sets a row goal at the appropriate point, and then works back down the plan tree estimating the number of rows it expects to need to examine to meet the row goal.

For example, a logical TOP(10) sets a row goal of 10 at a particular point in the logical query tree. The costs of operators leading up to the row goal are modified to estimate how many rows they need to produce to meet the row goal. This calculation can become complex, so it is easier to understand all this with a fully worked example and annotated execution plans. Row goals can affect more than the choice of join type or whether seeks and lookups are preferred to scans. More details on that here.

As always, an execution plan selected on the basis of a row goal is subject to the optimizer's reasoning abilities and the quality of information provided to it. Not every plan with a row goal will produce the required number of rows faster in practice, but according to the costing model it will.

Where a row goal plan proves not to be faster, there are usually ways to modify the query or provide better information to the optimizer such that the naturally selected plan is best. Which option is appropriate in your case depends on the details of course. The row goal feature is generally very effective (though there is a bug to watch out for when used in parallel execution plans).

Your particular query and plan may not be suitable for detailed analysis here (by all means provide an actual execution plan if you wish) but hopefully the ideas outlined here will allow you to make forward progress.

Sql-server – How to row estimates be improved in order to reduce chances of spills to tempdb

I won't comment about spills, tempdb or hints because the query seems pretty simple to need that much consideration. I think SQL-Server's optimizer will do its job quite good, if there are indexes suited for the query.

And your splitting into two queries is good as it shows what indexes will be useful. The first part:

(select convert(bigint, Value) NodeId
 from Oav.ValueArray
 where PropertyId = 3331  
   and ObjectId = 3540233
   and Sequence = 2)

needs an index on (PropertyId, ObjectId, Sequence) including the Value. I'd make it UNIQUE to be safe. The query would throw error anyway during runtime if more than one rows were returned, so it's good to ensure in advance that this won't happen, with the unique index:

CREATE UNIQUE INDEX
    PropertyId_ObjectId_Sequence_UQ
  ON Oav.ValueArray
    (PropertyId, ObjectId, Sequence) INCLUDE (Value) ;

The second part of the query:

select Value
  from Oav.ValueArray
 where ObjectId = @a               
   and PropertyId = 2840

needs an index on (PropertyId, ObjectId) including the Value:

CREATE INDEX
    PropertyId_ObjectId_IX
  ON Oav.ValueArray
    (PropertyId, ObjectId) INCLUDE (Value) ;

If efficiency is not improved or these indexes were not used or there are still differences in row estimates appearing, then there would be need to look further into this query.

In that case, the conversions (needed from the EAV design and the storing of different datatypes in the same columns) are a probable cause and your solution of splitting (as @AAron Bertrand and @Paul White comment) the query into two parts seems natural and the way to go. A redesign so to have different datatypes in their respective columns might be another.

Best Answer

Related Solutions

Sql-server – How (and why) does TOP impact an execution plan

Sql-server – How to row estimates be improved in order to reduce chances of spills to tempdb

Related Question