Sql-server – “Warnings: Operation caused residual I/O” versus key lookups

execution-plannonclustered-indexoptimizationsql serversql-server-2017

I've seen this warning in SQL Server 2017 execution plans:

Warnings: Operation caused residual IO [sic]. The actual number of rows read was (3,321,318), but the number of rows returned was 40.

Here is a snippet from SQLSentry PlanExplorer:

In order to improve the code, I've added a non-clustered index, so SQL Server can get to the relevant rows. It works fine, but normally there would be too many (big) columns to include in the index. It looks like this:

If I only add the index, without include columns, it looks like this, if I force the use of the index:

Obviously, SQL Server thinks the key lookup is much more expensive than residual I/O. I have a test setup without much test data (yet), but when the code goes into production, it needs to work with much more data, so I'm fairly sure that some sort of NonClustered index is needed.

Are key lookups really that expensive, when you run on SSDs, that I have to create full-fat indexes (with a lot of include columns)?

Execution plan: https://www.brentozar.com/pastetheplan/?id=SJtiRte2X It is part of a long stored procedure. Look for IX_BatchNo_DeviceNo_CreatedUTC.

Best Answer

The cost model used by the optimizer is exactly that: a model. It produces generally good results over a wide range of workloads, on a wide range of database designs, on a wide range of hardware.

You should generally not assume that individual cost estimates will strongly correlate with runtime performance on a particular hardware configuration. The point of costing is to allow the optimizer to make an educated choice between candidate physical alternatives for the same logical operation.

When you really get into the details, a skilled database professional (with the time to spare on tuning an important query) can often do better. To that extent, you can think of the optimizer's plan selection as a good starting point. In most cases, that starting point will also be the ending point, since the solution found is good enough.

In my experience (and opinion) the SQL Server query optimizer costs lookups higher than I would prefer. This is largely a hangover from the days when random physical I/O was much more expensive compared to sequential access than is often the case today.

Still, lookups can be expensive even on SSDs, or ultimately even when reading exclusively from memory. Traversing b-tree structures is not for free. Obviously the cost mounts as you do more of them.

Included columns are great for read-heavy OLTP workloads, where the trade-off between index space usage and update cost versus runtime read performance makes sense. There is also a trade-off to consider around plan stability. A fully covering index avoids the question of when exactly the optimizer's cost model might transition from one alternative to the other.

Only you can decide if the trade-offs are worth it in your case. Test both alternatives on a representative data sample, and make an informed choice.

In a question comment you added:

Are you telling me that SQL Server does not know the cost of the residual IO?

No, the optimizer does consider the cost of residual I/O. Indeed, as far as the optimizer is concerned, non-SARGable predicates are evaluated in a separate Filter. This filter is pushed into the seek or scan as a residual during post-optimization rewrites.

Rowstore

The sort spill itself can probably be addressed by enabling trace flag 7470. See FIX: Sort operator spills to tempdb when estimated number of rows and row size are correct. This trace flag corrects an oversight in the calculation. It is quite safe to use, and in my opinion ought to be on by default. The change is protected by a trace flag simply to avoid unexpected plan changes.

That said, avoiding the sort completely would be better as Rob Farley mentions in his answer. Changing the clustering key is one way to achieve that, but it may not be the optimal choice.

SQL Server chooses not to use your nonclustered index to avoid the sort because that index does not provide the other columns potentially needed for the update. Forcing that index with a hint would produce a plan with a large number of key lookups. The high estimated cost of that explains why the optimizer prefers a sort.

An alternative approach, which the optimizer is not currently able to consider on its own, is to find the keys of rows that will be updated, then fetch the additional columns (via a lookup type of operation) just for those rows. The plan provided shows no rows being updated. If that is the common case, or at least that a small fraction of the rows qualify for update, it might be worth coding that logic explicitly.

Another issue in the execution plan is that each target update row might be associated with multiple source rows. This is why there are ANY aggregates in the Stream Aggregate operator. Given multiple matches on the join keys (and a mismatch on the hash), which row will be used for the update is non-deterministic.

If the update had been written as a MERGE, an error would be thrown when multiple source rows are encountered. It is generally best to write deterministic updates, where each target row is associated with at most one source row.

Example

The question does not provide DDL or much background, so the following is a simple approximation where all the non-key columns are represented by a single large column, and cardinality & indexing are inferred from the plan:

DROP TABLE IF EXISTS 
    dbo.dwSource, 
    dbo.dwTarget;

CREATE TABLE dbo.dwSource
(
    loadkey bigint NOT NULL,
    mytableid integer NOT NULL,
    ppw_id integer NOT NULL,
    other_columns varchar(1000) NOT NULL,
    row_hash binary(20) NOT NULL,

    CONSTRAINT PK_dbo_dwSource
        PRIMARY KEY CLUSTERED (loadkey),
);

CREATE TABLE dbo.dwTarget
(
    mytableid integer NOT NULL,
    ppw_id integer NOT NULL,
    other_columns varchar(1000) NOT NULL,
    row_hash binary(20) NOT NULL,

    CONSTRAINT PK_dbo_dwTarget
        PRIMARY KEY CLUSTERED (ppw_id, mytableid)
);

UPDATE STATISTICS dbo.dwSource 
WITH ROWCOUNT = 1295450, PAGECOUNT = 100000;

UPDATE STATISTICS dbo.dwTarget 
WITH ROWCOUNT = 1296390, PAGECOUNT = 100000;

Given that approximate schema (ignoring the nonclustered indexes on source and target), the current update statement is:

UPDATE DT
SET DT.other_columns = DS.other_columns
FROM dbo.dwSource AS DS
JOIN dbo.dwTarget AS DT
    ON DT.ppw_id = DS.ppw_id
    AND DT.mytableid = DS.mytableid
WHERE DS.row_hash <> DT.row_hash;

Giving:

As mentioned, if the number of rows to be updated is relatively small, it may be worth locating the keys only as a first step. To do this optimally, we need a couple of nonclustered indexes, which may be similar to the existing ones:

-- Narrower than the clustered primary key
CREATE UNIQUE INDEX [UQ dbo.dwTarget ppw_id, mytableid (row_hash)]
ON dbo.dwTarget (ppw_id, mytableid) 
INCLUDE (row_hash);

-- Not guaranteed to be unique    
CREATE INDEX [IX dbo.dwSource ppw_id, mytableid (loadkey, row_hash)]
ON dbo.dwSource (ppw_id, mytableid) 
INCLUDE (loadkey, row_hash);

We can then write a query to locate the update keys and ensure that only one source row maps to each target row (arbitrarily choosing the row with the highest loadkey):

-- Find keys for updated rows
SELECT
    DS.ppw_id,
    DS.mytableid,
    loadkey = MAX(DS.loadkey)
INTO #Delta
FROM dbo.dwSource AS DS
WHERE EXISTS
(
    SELECT 1 
    FROM dbo.dwTarget AS DT
    WHERE
        DT.ppw_id = DS.ppw_id
        AND DT.mytableid = DS.mytableid
        AND DT.row_hash <> DS.row_hash
)
GROUP BY
    DS.ppw_id,
    DS.mytableid;

If testing shows this query would benefit from parallelism, you could add a OPTION (USE HINT ('ENABLE_PARALLEL_PLAN_PREFERENCE')) hint to give:

Now that we have the keys, we can tell the optimizer about the uniqueness using:

ALTER TABLE #Delta
ADD PRIMARY KEY CLUSTERED (ppw_id, mytableid);

The final update is then:

UPDATE DT
SET DT.other_columns = DS.other_columns
FROM #Delta AS DEL
JOIN dbo.dwTarget AS DT WITH (INDEX(1))
    ON DT.ppw_id = DEL.ppw_id
    AND DT.mytableid = DEL.mytableid
JOIN dbo.dwSource AS DS
    ON DS.loadkey = DEL.loadkey;

This ensures that non-key columns are only looked up for rows that will actually be updated. The WITH (INDEX(1)) hint ensures that rowset sharing can be used (so the index seek directly provides the location of the update). This can be omitted if testing shows the alternative plan naturally selected by the optimizer is better in practice. Note that it is important for nested loops to be chosen here. You might need to enforce that with a hint like OPTION (FAST 1). If the number of rows updated is truly always a small fraction, the optimizer ought to choose a nested loops plan naturally.

Columnstore

The key-location plan (with the Right Semi Merge Join) is still quite expensive, because all rows from both tables are read and tested.

If you have complete freedom over indexing (and there are no other significant drawbacks) a potentially optimal plan could be obtained via a couple of secondary columnstore indexes on the static (non-updated) columns:

CREATE NONCLUSTERED COLUMNSTORE INDEX nccsi 
ON dbo.dwSource (ppw_id, mytableid, loadkey, row_hash);

CREATE NONCLUSTERED COLUMNSTORE INDEX nccsi 
ON dbo.dwTarget (ppw_id, mytableid, row_hash);

This makes the key-location plan:

At the cost of a memory grant for the hash, this plan provides pure batch-mode operation for all operators (except the parallel insert), and early bitmap semi-join reduction. This will likely execute extremely quickly.

Columnstore performance is most impressive when batch mode execution is used, and the data is optimally arranged in highly-compressed segments. Data changes may be slower compared with rowstore, and may contribute to slower performance due to deleted row bitmaps and rows being held in rowstore delta segments. Choosing and maintaining an optimal columnstore configuration is not necessarily trivial, see the documentation starting at Columnstore indexes - Overview.

These are the main alternatives for you to investigate and apply (or not) as appropriate to your circumstances.

You could also consider making the hash code column a fixed-length binary rather than varbinary, assuming whatever hashing implementation you are using produces a fixed length result. I am also assuming you are happy to accept the small chance of the hash not detecting a change.

Best Answer

Related Solutions

Sql-server – Different execution plans depending on columns selected from CTE

Sql-server – Query Plan question

Rowstore

Example

Columnstore

Related Question