Sql-server – Optimizer not choosing index union plan

execution-planoptimizationsql server

Why is the following query slow?

select count(*) 

from [dbo].[mt_dispatch_link] 
   , [dbo].[_mt_dispatch] [_mt_dispatch]

where   (mt_dispatch_link.contract_id_1 = _mt_dispatch.contract_id 
     and mt_dispatch_link.dispatch_id_1 = _mt_dispatch.dispatch_id)

   or   (mt_dispatch_link.contract_id_2 = _mt_dispatch.contract_id 
     and mt_dispatch_link.dispatch_id_2 = _mt_dispatch.dispatch_id)

enter image description here

It takes over 10 minutes, and then I tend to stop it at that point. My question is more about how to make sense of the query plan.

Looking at the query plan I can see that the bottom clustered index scan is returning about 250000 records, but the cost is 0% and it is being put into a temporary table.

The top index scan is about 25000 records.

But the cost of 95% comes from the nested join. What conclusions should I draw from this?

The above query plan shows two index scans, does that mean it is doing 25000 + 250000 index scans, or does it mean that it is doing 25000 * 250000 index scans?

If I change the query to this (adding FORCESEEK):

select count(*) 

from [dbo].[mt_dispatch_link] 
   , [dbo].[_mt_dispatch] [_mt_dispatch] 

    WITH (FORCESEEK)

where   (mt_dispatch_link.contract_id_1 = _mt_dispatch.contract_id 
     and mt_dispatch_link.dispatch_id_1 = _mt_dispatch.dispatch_id)

   or   (mt_dispatch_link.contract_id_2 = _mt_dispatch.contract_id 
     and mt_dispatch_link.dispatch_id_2 = _mt_dispatch.dispatch_id)

I end up with a much better plan and the query runs instantly:

enter image description here

I ran update statistics on both tables. Didn't fix it unfortunately. The table design is not very good, so I think SQL Server doesn't really understand and is thus coming up with a poor query plan. Some more info about the table design at How to Optimise Query.

Why is the query optimiser not coming up with the optimum plan?

Best Answer

The optimizer does not always consider index-union plans (like the one shown in your second graphic) to resolve disjunctions (OR predicates) unless a FORCESEEK or INDEX hint is specified. This is a heuristic* based on some practical considerations:

Index union is not often enough a good plan selection for general queries.
The number of ways indexes can be combined grows exponentially.

Using a hint changes the way the optimizer searches the space of possible plans. It disables some of the general heuristics and pursues a more goal-orientated strategy.

The optimizer's usual primary goal is to find a good plan quickly. It does not exhaustively search for the 'best' plan (even relatively simple queries could take years to compile if it did).

Joins with multiple conditions separated with OR have long been problematic. Over the years, the optimizer has added new tricks like converting them to equivalent UNION forms, but the transformations available are limited, so it is quite easy to come unstuck.

As far as the query plan is concerned:

The first row from DispatchLink causes a full scan of the Dispatch table
The result of the scan is stored in an internal tempdb worktable (the Table Spool)
The join checks every row from the worktable against the full OR predicate
The next row is fetched from DispatchLink and the process repeats from step 3

If there are 25,000 rows in the Dispatch Link table, the spool will be fully scanned 25,000 times. This is a disaster of course (and without index intersection, the best the optimizer can do is run the whole thing on multiple threads).

Percentage costs in query plans are only the optimizer's estimates. They never reflect actual execution costs, and are subject to the optimizer's model and will usually bear little resemblance to the 'true' cost of executing the plan on your specific hardware.

Costing numbers are there to be informative, but they should not be taken literally. The particular model the optimizer uses happens to produce pretty good plans for most queries on most systems across the world - that does not mean the model approximates anyone's reality, just that it happens to work reasonably well in practice.

Changing the design so that (Dispatch, Contract) pairs are stored in rows rather than repeated across columns will make the whole index-intersection problem go away. Relational designs with useful constraints and indexes almost always get the best out of the optimizer.

_{* This can be overridden with undocumented trace flag 8726}

Required Software Updates

The first is SQL Server. You are running SQL Server 2008 Service Pack 1 (build 2531). You ought to be patched up to at least the current Service Pack (SQL Server 2008 Service Pack 3 - build 5500). The most recent build of SQL Server 2008 at the time of writing is Service Pack 3, Cumulative Update 12 (build 5844).

The second piece of software is SQL Sentry Plan Explorer. The latest versions have significant new features and fixes, including the ability to directly upload a query plan for expert analysis (no need to paste XML anywhere!)

Query Plan Analysis

The cardinality estimate for the table variable is exactly right, thanks to a statement-level recompilation:

table variable estimate

Unfortunately, table variables do not maintain distribution statistics, so all the optimizer knows is that there are six rows; it knows nothing of the values that might be in those six rows. This information is crucial given that the next operation is a join to another table. The cardinality estimate from that join is based on a wild guess by the optimizer:

first join estimate

From that point on, the plan chosen by the optimizer is based on incorrect information, so it's no wonder really that performance is so poor. In particular, the memory set aside for sorts and hash tables for hash joins will be far too small. At execution time, the overflowing sorts and hashing operations will be spilled out to physical tempdb disk.

SQL Server 2008 does not highlight this in execution plans; you can monitor the spills using Extended Events or Profiler Sort Warnings and Hash Warnings. Memory is reserved for sorts and hashes based on cardinality estimates before execution starts, and cannot be increased during execution regardless of how much spare memory your SQL Server may have. Accurate row count estimates are therefore crucial for any execution plan that involves workspace memory consuming operations.

Your query is also parameterized. You should consider adding OPTION (RECOMPILE) to the query if different parameter values affect the query plan. You should probably consider using it anyway, so the optimizer can see the value of @Param1 at compilation time. If nothing else, this may help the optimizer produce a more reasonable estimate for the index seek shown above, given that the table is very large, and partitioned. It may also enable static partition elimination.

Try the query again with a temporary table instead of the table variable and OPTION (RECOMPILE). You should also try materializing the result of the first join into another temporary table, and run the rest of the query against that. The number of rows is not all that large (3,285,620) so this should be reasonably quick. The optimizer will then have an exact cardinality estimate and distribution statistics for the result of the join. With luck, the rest of the plan will fall into place nicely.

Working from the properties shown in the plan, the materializing query would be:

SELECT
    A.A_Var7_FK_LK,
    A.A_Var4_FK_LK,
    A.A_Var6_FK_LK, 
    A.A_Var5_FK_LK,
    A.A_Var1,
    A.A_Var2,
    A.A_Var3_FK_LK
INTO #AnotherTempTable
FROM @MyTableVar AS B
JOIN TableA AS A
    ON A.Job = B.B_Var1_PK
WHERE
    A_Var8_FK_LK = @Param1;

You could also INSERT into a predefined temporary table (the correct data types are not shown in the plan, so I cannot do that part). The new temporary table may or may not benefit from clustered and nonclustered indexes.

Sql-server – Understanding below execution plan

1. How to understand estimated operator cost? Tb1 which don't have index is scanned and cost is 2 %, whereas index is being used on tb2 and cost is 98%.

Plan

The heap table is only fully scanned once, but the index seek is executed 1,000,000 times. The optimizer estimates that a million seeks in this case will represent 98.4% of the total cost of executing the query, whereas a single parallel scan of the heap table will represent 0.9% of the cost.

These are just estimates used for internal plan choice reasons; they do not generally reflect real-world performance on modern hardware, and are never anything more than an estimate - even in a post-execution ("actual") execution plan.

In Management Studio:

Seek tooltip

In SQL Sentry Plan Explorer:

Plan Explorer tooltip

2. From the above snip of table scan (whose cost is 2%),number of executions are 24,will that means sql read rows in batches and stored in memory and for each row it did a seek operation from tbl2.

No, it means 24 parallel threads co-operated to perform a single scan of the heap table. Each thread still reads a row at a time from the scan, performs a seek into the indexed table, then gets the next row from the scan, and so on until the task is complete.

Rows are not read in batches and stored in memory in this plan. SQL Server reports 24 scans because 24 threads each performed a partial scan of the table, resulting in one full scan overall.

3. Also any pointers to know more about force scan,force index when I pressed F4 after clicking an operator

The ForceScan, ForceSeek, and ForcedIndex properties are set to true if the query specifies a FORCESCAN, FORCESEEK, or INDEX hint - or if the query optimizer decides that a particular access strategy is required for correctness (for example, when checking foreign key constraints).

Hint properties

Best Answer

Related Solutions

Sql-server – How to optimise T-SQL query using Execution Plan

Required Software Updates

Query Plan Analysis

Sql-server – Understanding below execution plan

Related Question