Sql-server – Query 100x slower in SQL Server 2014, Row Count Spool row estimate the culprit

cardinality-estimatesperformancequery-performancesql serversql server 2014

I have a query that runs in 800 milliseconds in SQL Server 2012 and takes about 170 seconds in SQL Server 2014. I think that I've narrowed this down to a poor cardinality estimate for the Row Count Spool operator. I've read a bit about spool operators (e.g., here and here), but am still having trouble understanding a few things:

Why does this query need a Row Count Spool operator? I don't think it's necessary for correctness, so what specific optimization is it trying to provide?
Why does SQL Server estimate that the join to the Row Count Spool operator removes all rows?
Is this a bug in SQL Server 2014? If so, I'll file in Connect. But I'd like a deeper understanding first.

Note: I can re-write the query as a LEFT JOIN or add indexes to the tables in order to achieve acceptable performance in both SQL Server 2012 and SQL Server 2014. So this question is more about understanding this specific query and plan in depth and less about how to phrase the query differently.

The slow query

See this Pastebin for a full test script. Here is the specific test query I'm looking at:

-- Prune any existing customers from the set of potential new customers
-- This query is much slower than expected in SQL Server 2014 
SELECT *
FROM #potentialNewCustomers -- 10K rows
WHERE cust_nbr NOT IN (
    SELECT cust_nbr
    FROM #existingCustomers -- 1MM rows
)

SQL Server 2014: The estimated query plan

SQL Server believes that the Left Anti Semi Join to the Row Count Spool will filter the 10,000 rows down to 1 row. For this reason, it selects a LOOP JOIN for the subsequent join to #existingCustomers.

SQL Server 2014: The actual query plan

As expected (by everyone but SQL Server!), the Row Count Spool did not remove any rows. So we are looping 10,000 times when SQL Server expected to loop just once.

SQL Server 2012: The estimated query plan

When using SQL Server 2012 (or OPTION (QUERYTRACEON 9481) in SQL Server 2014), the Row Count Spool does not reduce the estimated # of rows and a hash join is chosen, resulting in a far better plan.

The LEFT JOIN re-write

For reference, here is a way that I may re-write the query in order to achieve good performance in all SQL Server 2012, 2014, and 2016. However, I'm still interested in the specific behavior of the query above and whether it is a bug in the new SQL Server 2014 Cardinality Estimator.

-- Re-writing with LEFT JOIN yields much better performance in 2012/2014/2016
SELECT n.*
FROM #potentialNewCustomers n
LEFT JOIN (SELECT 1 AS test, cust_nbr FROM #existingCustomers) c
    ON c.cust_nbr = n.cust_nbr
WHERE c.test IS NULL

Best Answer

Why does this query need a Row Count Spool operator? ... what specific optimization is it trying to provide?

The cust_nbr column in #existingCustomers is nullable. If it actually contains any nulls the correct response here is to return zero rows (NOT IN (NULL,...) will always yield an empty result set.).

So the query can be thought of as

SELECT p.*
FROM   #potentialNewCustomers p
WHERE  NOT EXISTS (SELECT *
                   FROM   #existingCustomers e1
                   WHERE  p.cust_nbr = e1.cust_nbr)
       AND NOT EXISTS (SELECT *
                       FROM   #existingCustomers e2
                       WHERE  e2.cust_nbr IS NULL)

With the rowcount spool there to avoid having to evaluate the

EXISTS (SELECT *
        FROM   #existingCustomers e2
        WHERE  e2.cust_nbr IS NULL)

More than once.

This just seems to be a case where a small difference in assumptions can make quite a catastrophic difference in performance.

After updating a single row as below...

UPDATE #existingCustomers
SET    cust_nbr = NULL
WHERE  cust_nbr = 1;

... the query completed in less than a second. The row counts in actual and estimated versions of the plan are now nearly spot on.

SET STATISTICS TIME ON;
SET STATISTICS IO ON;

SELECT *
FROM   #potentialNewCustomers
WHERE  cust_nbr NOT IN (SELECT cust_nbr
                        FROM   #existingCustomers 
                       )

Zero rows are output as described above.

The Statistics Histograms and auto update thresholds in SQL Server are not granular enough to detect this kind of single row change. Arguably if the column is nullable it might be reasonable to work on the basis that it contains at least one NULL even if the statistics histogram doesn't currently indicate that there are any.

Related Solutions

Sql-server – Query is slow in SQL Server 2014, fast in SQL Server 2012

The most likely situation is that the new SQL 2014 Cardinality Estimator is yielding a poor row estimate for one or more joins in your query and this has led SQL Server to choose an inefficient plan.

If you are able to run the query in SQL 2014 with "include actual execution plan" turned on, you can use the query below in another tab to view the real-time progress of rows flowing through each query operator. I noticed that you only have an estimated plan for 2014 (compared to an actual plan for 2012), presumably because you cannot run the query to completion in SQL 2014. So this could give you more insight into the actual rows flowing through the query in 2014 and may lead you to a way of tweaking the query that runs efficiently using the new Cardinality Estimator.

In the meantime, until you are able to optimize the query you could use QUERYTRACEON with trace flag 9481 for this query or you could follow Brent Ozar's advice of running the database at the SQL 2012 compatibility level, carefully testing your queries with the new Cardinality Estimator, and only updating the compatibility level to 120 (SQL 2014) once satisfied with these results.

/* Live query progress in SQL 2014 */
SELECT session_id,node_id,physical_operator_name, SUM(row_count) row_count, SUM(estimate_row_count) AS estimate_row_count, 
    CAST(SUM(row_count)*100 AS float)/NULLIF(SUM(estimate_row_count),0) AS percent_complete,
    SUM(elapsed_time_ms) AS elapsed_time_ms,
    SUM(cpu_time_ms) AS cpu_time_ms,
    SUM(logical_read_count) AS logical_read_count,
    SUM(physical_read_count) AS physical_read_count,
    SUM(write_page_count) AS spill_page_count,
    SUM(segment_read_count) AS segment_read_count,
    SUM(segment_skip_count) AS segment_skip_count,
    COUNT(*) AS num_threads
FROM sys.dm_exec_query_profiles 
WHERE session_id <> @@spid
GROUP BY session_id,node_id,physical_operator_name
ORDER BY session_id,node_id;

Sql-server – Why does LEN() function badly underestimate cardinality in SQL Server 2014

For the legacy CE, I see the estimate is for 3.16228 % of the rows – and that is a "magic number" heuristic used for column = literal predicates (there are other heuristics based on predicate construction – but the LEN wrapped around the column for the legacy CE results matches this guess-framework). You can see examples of this on a post on Selectivity Guesses in absence of Statistics by Joe Sack, and Constant-Constant Comparison Estimation by Ian Jose.

-- Legacy CE: 31622.8 rows
SELECT  COUNT(*)
FROM    #customers
WHERE   LEN(cust_nbr) = 6
OPTION  ( QUERYTRACEON 9481); -- Legacy CE
GO

Now as for the new CE behavior, it looks like this is now visible to the optimizer (which means we can use statistics). I went through the exercise of looking at the calculator output below, and you can look at the associated auto-generation of stats as a pointer:

-- New CE: 1.00007 rows
SELECT  COUNT(*)
FROM    #customers
WHERE   LEN(cust_nbr) = 6
OPTION  ( QUERYTRACEON 2312 ); -- New CE
GO

-- View New CE behavior with 2363 (for supported option use XEvents)
SELECT  COUNT(*)
FROM    #customers
WHERE   LEN(cust_nbr) = 6
OPTION  (QUERYTRACEON 2312, QUERYTRACEON 2363, QUERYTRACEON 3604, RECOMPILE); -- New CE
GO

/*
Loaded histogram for column QCOL:
[tempdb].[dbo].[#customers].cust_nbr from stats with id 2
Using ambient cardinality 1e+006 to combine distinct counts:
  999927
 
Combined distinct count: 999927
Selectivity: 1.00007e-006
Stats collection generated:
  CStCollFilter(ID=2, CARD=1.00007)
      CStCollBaseTable(ID=1, CARD=1e+006 TBL: #customers)
 
End selectivity computation
*/
 
EXEC tempdb..sp_helpstats '#customers';


--Check out AVG_RANGE_ROWS values (for example - plenty of ~ 1)
DBCC SHOW_STATISTICS('tempdb..#customers', '_WA_Sys_00000001_B0368087');
--That's my Stats name yours is subject to change

Unfortunately the logic relies on an estimate of the number of distinct values, which is not adjusted for the effect of the LEN function.

Possible workaround

You can get a trie-based estimate under both CE models by rewriting the LEN as a LIKE:

SELECT COUNT_BIG(*)
FROM #customers AS C
WHERE C.cust_nbr LIKE REPLICATE('_', 6);

Information on Trace Flags used:

2363: shows a lot of information, including statistics being loaded.
3604: prints the output of DBCC commands to the messages tab.

Best Answer

Related Solutions

Sql-server – Query is slow in SQL Server 2014, fast in SQL Server 2012

Sql-server – Why does LEN() function badly underestimate cardinality in SQL Server 2014

Possible workaround

Related Question