SQL Server 2008 R2 – Forcing Query Optimizer to Better Estimate Resources with Hints

hintsoptimizationperformancequery-performancesql serversql-server-2008-r2

I know there's a couple of alternate ways to write this query, but I'm more curious if there's another approach that I could use to improve the performance by helping guide the optimizer in the right direction? For example, can I use any kind of query hints / options to improve performance here?

Currently it runs in about 40 minutes and returns about 30,000 records.

Note: I'm also interested if there's new features in 2019 that apply here that don't apply in 2008 R2 (I know one big difference between the two is the Cardinality Estimator).

Best Answer

This query, as written, must compare every pair of rows, and the cost estimate is reasonably accurate.

On a supported version of SQL Server you might get a parallel plan, or there is an undocumented query hint that will force a parallel plan.

select t1.itemid, t2.itemid
from #temp T1  
inner join #temp as t2
  on t1.itemid like '%' + t2.itemid + '%'
  and t2.itemid <> t1.itemid
OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'))

Related Solutions

Sql-server – Procedure cache (plan cache) size

You can't control plan cache size manually. The following Connect item describes both the fact that they won't be adding a knob any time soon and also that Resource Governor might be marginally useful in a few narrow use cases:

http://connect.microsoft.com/SQLServer/feedback/details/293188/amount-of-ram-for-procedure-cache-should-be-configurable

However, as I've alluded to in the comments, there are other ways you can make the plan cache more effective (e.g. use the "optimize for ad hoc workloads" setting). Kimberly explains this in greater detail, but essentially, this setting places much smaller stubs in the cache until the same plan is used more than once, and only then does it cache the whole thing. Under the default configuration, and especially in cases where a lot of dynamically assembled SQL is in place, single-use plans often waste a lot of space in the cache. If you reduce the amount of wasted space these occupy, you leave more room for plans that are used more than once, and those are precisely the ones you want to keep around anyway.

SQL Server 2014 – Query 100x Slower Due to Row Count Spool Row Estimate

Why does this query need a Row Count Spool operator? ... what specific optimization is it trying to provide?

The cust_nbr column in #existingCustomers is nullable. If it actually contains any nulls the correct response here is to return zero rows (NOT IN (NULL,...) will always yield an empty result set.).

So the query can be thought of as

SELECT p.*
FROM   #potentialNewCustomers p
WHERE  NOT EXISTS (SELECT *
                   FROM   #existingCustomers e1
                   WHERE  p.cust_nbr = e1.cust_nbr)
       AND NOT EXISTS (SELECT *
                       FROM   #existingCustomers e2
                       WHERE  e2.cust_nbr IS NULL)

With the rowcount spool there to avoid having to evaluate the

EXISTS (SELECT *
        FROM   #existingCustomers e2
        WHERE  e2.cust_nbr IS NULL)

More than once.

This just seems to be a case where a small difference in assumptions can make quite a catastrophic difference in performance.

After updating a single row as below...

UPDATE #existingCustomers
SET    cust_nbr = NULL
WHERE  cust_nbr = 1;

... the query completed in less than a second. The row counts in actual and estimated versions of the plan are now nearly spot on.

SET STATISTICS TIME ON;
SET STATISTICS IO ON;

SELECT *
FROM   #potentialNewCustomers
WHERE  cust_nbr NOT IN (SELECT cust_nbr
                        FROM   #existingCustomers 
                       )

Zero rows are output as described above.

The Statistics Histograms and auto update thresholds in SQL Server are not granular enough to detect this kind of single row change. Arguably if the column is nullable it might be reasonable to work on the basis that it contains at least one NULL even if the statistics histogram doesn't currently indicate that there are any.

Best Answer

Related Solutions

Sql-server – Procedure cache (plan cache) size

SQL Server 2014 – Query 100x Slower Due to Row Count Spool Row Estimate

Related Question