Sql-server – Why does the estimated cost of (the same) 1000 seeks on a unique index differ in these plans

optimizationsql serversql server 2014

In the queries below both execution plans are estimated to perform 1,000 seeks on a unique index.

The seeks are driven by an ordered scan on the same source table so seemingly should end up seeking the same values in the same order.

Both nested loops have <NestedLoops Optimized="false" WithOrderedPrefetch="true">

Anyone know why this task is costed at 0.172434 in the first plan but 3.01702 in the second?

(The reason for the question is that the first query was suggested to me as an optimisation due to the apparent much lower plan cost. It actually looks to me as though it does more work but I'm just attempting to explain the discrepancy…)

Setup

CREATE TABLE dbo.Target(KeyCol int PRIMARY KEY, OtherCol char(32) NOT NULL);

CREATE TABLE dbo.Staging(KeyCol int PRIMARY KEY, OtherCol char(32) NOT NULL); 

INSERT INTO dbo.Target
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY @@SPID), LEFT(NEWID(),32)
FROM master..spt_values v1,  
     master..spt_values v2;

INSERT INTO dbo.Staging
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY @@SPID), LEFT(NEWID(),32)
FROM master..spt_values v1;

Query 1 "Paste the plan" link

WITH T
     AS (SELECT *
         FROM   Target AS T
         WHERE  T.KeyCol IN (SELECT S.KeyCol
                             FROM   Staging AS S))
MERGE T
USING Staging S
ON ( T.KeyCol = S.KeyCol )
WHEN NOT MATCHED THEN
  INSERT ( KeyCol, OtherCol )
  VALUES(S.KeyCol, S.OtherCol )
WHEN MATCHED AND T.OtherCol > S.OtherCol THEN
  UPDATE SET T.OtherCol = S.OtherCol;

Query 2 "Paste the plan" link

MERGE Target T
USING Staging S
ON ( T.KeyCol = S.KeyCol )
WHEN NOT MATCHED THEN
  INSERT ( KeyCol, OtherCol )
  VALUES( S.KeyCol, S.OtherCol )
WHEN MATCHED AND T.OtherCol > S.OtherCol THEN
  UPDATE SET T.OtherCol = S.OtherCol; 

Query 1

Query 2

The above was tested on SQL Server 2014 (SP2) (KB3171021) – 12.0.5000.0 (X64)


@Joe Obbish points out in the comments that a simpler repro would be

SELECT *
FROM staging AS S 
  LEFT OUTER JOIN Target AS T 
    ON T.KeyCol = S.KeyCol;

vs

SELECT *
FROM staging AS S 
  LEFT OUTER JOIN (SELECT * FROM Target) AS T 
    ON T.KeyCol = S.KeyCol;

For the 1,000 row staging table both of the above still have the same plan shape with nested loops and the plan without the derived table appearing cheaper, but for a 10,000 row staging table and same target table as above the difference in costs does change the plan shape (with a full scan and merge join seeming relatively more attractive than expensively costed seeks) showing this cost discrepancy can have implications other than just making it harder to compare plans.

enter image description here

Best Answer

Anyone know why this task is costed at 0.172434 in the first plan but 3.01702 in the second?

Generally speaking, an inner side seek below a nested loops join is costed assuming a random I/O pattern. There is a simple replacement-based reduction for subsequent accesses, accounting for the chance that the required page has already been brought into memory by a previous iteration. This basic assessment produces the standard (higher) cost.

There is another costing input, Smart Seek Costing, about which little detail is known. My guess (and that is all it is at this stage) is that SSC attempts to assess inner side seek I/O cost in more detail, perhaps by considering local ordering and/or the range of values to fetch. Who knows.

For example, the first seeking operation brings in not just the requested row, but all rows on that page (in index order). Given the overall access pattern, fetching the 1000 rows in 1000 seeks requires only 2 physical reads, even with read-ahead and prefetching disabled. From that perspective, the default I/O costing represents a significant overestimate, and the SSC-adjusted cost is closer to reality.

It seems reasonable to expect that SSC would be most effective where the loop drives an index seek more or less directly, and the join outer reference is the basis of the seeking operation. From what I can tell, SSC is always attempted for suitable physical operations, but most often produces no downward adjustment when the seek is separated from the join by other operations. Simple filters are one exception to this, perhaps because SQL Server can often push these into the data access operator. In any case, the optimizer has pretty deep support for selections.

It is unfortunate that the Compute Scalar for the subquery outer projections seems to interfere with SSC here. Compute Scalars are usually relocated above the join, but these ones have to stay where they are. Even so, most normal Compute Scalars are pretty transparent to optimization, so this is a bit surprising.

Regardless, when the physical operation PhyOp_Range is produced from a simple selection on an index SelIdxToRng, SSC is effective. When the more complex SelToIdxStrategy (selection on a table to an index strategy) is employed, the resulting PhyOp_Range runs SSC but results in no reduction. Again, it seems that simpler, more direct operations work best with SSC.

I wish I could tell you exactly what SSC does, and show the exact calculations, but I don't know those details. If you want to explore the limited trace output available for yourself, you can employ undocumented trace flag 2398. An example output is:

Smart seek costing (7.1) :: 1.34078e+154 , 0.001

That example relates to memo group 7, alternative 1, showing a cost upper bound, and a factor of 0.001. To see cleaner factors, be sure to rebuild the tables without parallelism so the pages are as dense as possible. Without doing that, the factor is more like 0.000821 for your example Target table. There are some fairly obvious relationships there, of course.

SSC can also be disabled with undocumented trace flag 2399. With that flag active, both costs are the higher value.