Sql-server – Why does the estimated cost of (the same) 1000 seeks on a unique index differ in these plans

optimizationsql serversql server 2014

In the queries below both execution plans are estimated to perform 1,000 seeks on a unique index.

The seeks are driven by an ordered scan on the same source table so seemingly should end up seeking the same values in the same order.

Both nested loops have <NestedLoops Optimized="false" WithOrderedPrefetch="true">

Anyone know why this task is costed at 0.172434 in the first plan but 3.01702 in the second?

(The reason for the question is that the first query was suggested to me as an optimisation due to the apparent much lower plan cost. It actually looks to me as though it does more work but I'm just attempting to explain the discrepancy…)

Setup

CREATE TABLE dbo.Target(KeyCol int PRIMARY KEY, OtherCol char(32) NOT NULL);

CREATE TABLE dbo.Staging(KeyCol int PRIMARY KEY, OtherCol char(32) NOT NULL); 

INSERT INTO dbo.Target
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY @@SPID), LEFT(NEWID(),32)
FROM master..spt_values v1,  
     master..spt_values v2;

INSERT INTO dbo.Staging
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY @@SPID), LEFT(NEWID(),32)
FROM master..spt_values v1;

Query 1 _{"Paste the plan" link}

WITH T
     AS (SELECT *
         FROM   Target AS T
         WHERE  T.KeyCol IN (SELECT S.KeyCol
                             FROM   Staging AS S))
MERGE T
USING Staging S
ON ( T.KeyCol = S.KeyCol )
WHEN NOT MATCHED THEN
  INSERT ( KeyCol, OtherCol )
  VALUES(S.KeyCol, S.OtherCol )
WHEN MATCHED AND T.OtherCol > S.OtherCol THEN
  UPDATE SET T.OtherCol = S.OtherCol;

Query 2 _{"Paste the plan" link}

MERGE Target T
USING Staging S
ON ( T.KeyCol = S.KeyCol )
WHEN NOT MATCHED THEN
  INSERT ( KeyCol, OtherCol )
  VALUES( S.KeyCol, S.OtherCol )
WHEN MATCHED AND T.OtherCol > S.OtherCol THEN
  UPDATE SET T.OtherCol = S.OtherCol;

Query 1

Query 2

The above was tested on SQL Server 2014 (SP2) (KB3171021) – 12.0.5000.0 (X64)

@Joe Obbish points out in the comments that a simpler repro would be

SELECT *
FROM staging AS S 
  LEFT OUTER JOIN Target AS T 
    ON T.KeyCol = S.KeyCol;

SELECT *
FROM staging AS S 
  LEFT OUTER JOIN (SELECT * FROM Target) AS T 
    ON T.KeyCol = S.KeyCol;

For the 1,000 row staging table both of the above still have the same plan shape with nested loops and the plan without the derived table appearing cheaper, but for a 10,000 row staging table and same target table as above the difference in costs does change the plan shape (with a full scan and merge join seeming relatively more attractive than expensively costed seeks) showing this cost discrepancy can have implications other than just making it harder to compare plans.

Best Answer

Anyone know why this task is costed at 0.172434 in the first plan but 3.01702 in the second?

Generally speaking, an inner side seek below a nested loops join is costed assuming a random I/O pattern. There is a simple replacement-based reduction for subsequent accesses, accounting for the chance that the required page has already been brought into memory by a previous iteration. This basic assessment produces the standard (higher) cost.

There is another costing input, Smart Seek Costing, about which little detail is known. My guess (and that is all it is at this stage) is that SSC attempts to assess inner side seek I/O cost in more detail, perhaps by considering local ordering and/or the range of values to fetch. Who knows.

For example, the first seeking operation brings in not just the requested row, but all rows on that page (in index order). Given the overall access pattern, fetching the 1000 rows in 1000 seeks requires only 2 physical reads, even with read-ahead and prefetching disabled. From that perspective, the default I/O costing represents a significant overestimate, and the SSC-adjusted cost is closer to reality.

It seems reasonable to expect that SSC would be most effective where the loop drives an index seek more or less directly, and the join outer reference is the basis of the seeking operation. From what I can tell, SSC is always attempted for suitable physical operations, but most often produces no downward adjustment when the seek is separated from the join by other operations. Simple filters are one exception to this, perhaps because SQL Server can often push these into the data access operator. In any case, the optimizer has pretty deep support for selections.

It is unfortunate that the Compute Scalar for the subquery outer projections seems to interfere with SSC here. Compute Scalars are usually relocated above the join, but these ones have to stay where they are. Even so, most normal Compute Scalars are pretty transparent to optimization, so this is a bit surprising.

Regardless, when the physical operation PhyOp_Range is produced from a simple selection on an index SelIdxToRng, SSC is effective. When the more complex SelToIdxStrategy (selection on a table to an index strategy) is employed, the resulting PhyOp_Range runs SSC but results in no reduction. Again, it seems that simpler, more direct operations work best with SSC.

I wish I could tell you exactly what SSC does, and show the exact calculations, but I don't know those details. If you want to explore the limited trace output available for yourself, you can employ undocumented trace flag 2398. An example output is:

Smart seek costing (7.1) :: 1.34078e+154 , 0.001

That example relates to memo group 7, alternative 1, showing a cost upper bound, and a factor of 0.001. To see cleaner factors, be sure to rebuild the tables without parallelism so the pages are as dense as possible. Without doing that, the factor is more like 0.000821 for your example Target table. There are some fairly obvious relationships there, of course.

SSC can also be disabled with undocumented trace flag 2399. With that flag active, both costs are the higher value.

Related Solutions

Sql-server – SELECT/INSERT Deadlock

On the face of it, this looks like a classic lookup deadlock. The essential ingredients for this deadlock pattern are:

a SELECT query that uses a non-covering nonclustered index with a Key Lookup
an INSERT query that modifies the clustered index and then the nonclustered index

The SELECT accesses the nonclustered index first, then the clustered index. The INSERT access the clustered index first, then the nonclustered index. Accessing the same resources in a different order acquiring incompatible locks is a great way to 'achieve' a deadlock of course.

In this case, the SELECT query is:

SELECT query

...and the INSERT query is:

INSERT query

Notice the green highlighted non-clustered indexes maintenance.

We would need to see the serial version of the SELECT plan in case it is very different from the parallel version, but as Jonathan Kehayias notes in his guide to Handling Deadlocks, this particular deadlock pattern is very sensitive to timing and internal query execution implementation details. This type of deadlock often comes and goes without an obvious external reason.

Given access to the system concerned, and suitable permissions, I am certain we could eventually work out exactly why the deadlock occurs with the parallel plan but not the serial (assuming the same general shape). Potential lines of enquiry include checking for optimized nested loops and/or prefetching - both of which can internally escalate the isolation level to REPEATABLE READ for the duration of the statement. It is also possible that some feature of parallel index seek range assignment contributes to the issue. If the serial plan becomes available, I might spend some time looking into the details further, as it is potentially interesting.

The usual solution for this type of deadlocking is to make the index covering, though the number of columns in this case might make that impractical (and besides, we are not supposed to mess with such things on SharePoint, I am told). Ultimately, the recommendation for serial-only plans when using SharePoint is there for a reason (though not necessarily a good one, when it comes right down to it). If the change in cost threshold for parallelism fixes the issue for the moment, this is good. Longer term, I would probably look to separate the workloads, perhaps using Resource Governor so that SharePoint internal queries get the desired MAXDOP 1 behaviour and the other application is able to use parallelism.

The question of exchanges appearing in the deadlock trace seems a red herring to me; simply a consequence of the independent threads owning resources which technically must appear in the tree. I cannot see anything to suggest that the exchanges themselves are contributing directly to the deadlocking issue.

Sql-server – Rejoining range seek on nullable composite index

Is there any way to write a query that just seeks into that composite index key and then follows it along to retrieve the next chunk of 1000 rows?

A favourite solution of mine is to use an API cursor:

SET NOCOUNT ON;
SET STATISTICS IO ON;

DECLARE 
    @cur integer,
    -- FAST_FORWARD, AUTO_FETCH, AUTO_CLOSE, CHECK_ACCEPTED_TYPES, FAST_FORWARD_ACCEPTABLE
    @scrollopt integer = 16 | 8192 | 16384 | 32768 | 1048576,
    -- READ_ONLY, CHECK_ACCEPTED_OPTS, READ_ONLY_ACCEPTABLE
    @ccopt integer = 1 | 32768 | 65536, 
    @rowcount integer = 1000,
    @rc integer;

-- Open the cursor and return (up to) the first 1000 rows
EXECUTE @rc = sys.sp_cursoropen
    @cur OUTPUT,
    N'
    SELECT A, B, C
    FROM T
    ORDER BY A, B;
    ',
    @scrollopt OUTPUT,
    @ccopt OUTPUT,
    @rowcount OUTPUT;

IF @rc <> 16 -- FastForward cursor automatically closed
BEGIN
    -- Name the cursor so we can use CURSOR_STATUS
    EXECUTE sys.sp_cursoroption
        @cur, 
        2, 
        'MyCursorName';

    -- Until the cursor auto-closes
    WHILE CURSOR_STATUS('global', 'MyCursorName') = 1
    BEGIN
        EXECUTE sys.sp_cursorfetch
            @cur,
            2,
            0,
            1000;
    END;
END;

SET STATISTICS IO OFF;

The overall strategy is a single scan that remembers its position between calls. Using an API cursor means we can return a block of rows rather than one at a time as would be the case with a T-SQL cursor:

Execution plans

The STATISTICS IO output is:

Table 'T'. Scan count 1, logical reads 1011, physical reads 0, read-ahead reads 0
Table 'T'. Scan count 1, logical reads 1001, physical reads 0, read-ahead reads 0
Table 'T'. Scan count 1, logical reads 516, physical reads 0, read-ahead reads 0