Index seeks and predicate push down

execution-planperformancequery-performancesql serversql-server-2019

I have a query of the following form:

IF EXISTS (
    SELECT    1
    FROM    (
                SELECT    RowID, OETID
                FROM    @InMemoryTableTypeTable i
                UNION
                SELECT    RowID, OETID
                FROM    @InMemoryTableTypeTable d
            ) AS t               
    WHERE    NOT EXISTS (
                SELECT    1
                FROM    dbo.MyTable m WITH(FORCESEEK, ROWLOCK, UPDLOCK)
                WHERE    (m.OETID = t.RowID)
                        AND (m.SRID = t.OETID)
                        AND (m.WTID = @WTID)
                        AND (m.Status <> 1)
                        AND (m.SRID > 0)
            )
)
...

The definition of dbo.MyTable is:

CREATE TABLE [dbo].[MyTable](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [RowGUID] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
    [WTID] [bigint] NOT NULL,
    [OETID] [int] NOT NULL,
    [SRID] [bigint] NOT NULL,
    [Status] [tinyint] NOT NULL,
 CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE UNIQUE NONCLUSTERED INDEX [IDX] ON [dbo].[MyTable]
(
    [WTID] ASC,
    [OETID] ASC,
    [SRID] ASC
)
INCLUDE([Status]) 
WHERE ([SRID]>(0))
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

ALTER TABLE [dbo].[MyTable] ADD  CONSTRAINT [DF_MyTable_RowGUID]  DEFAULT (NEWID()) FOR [RowGUID]
GO

The definition of @InMemoryTableTypeTable is

CREATE TYPE [dbo].[TableType] AS TABLE(
    [ID] [bigint] NOT NULL,
    [RowID] [int] NOT NULL,
    [OETID] [int] NOT NULL,
     PRIMARY KEY NONCLUSTERED 
(
    [ID] ASC
)
)
WITH ( MEMORY_OPTIMIZED = ON )
GO

The table MyTable contains ~500k rows and has a unique filtered index that has:

  • WTID, OETID and SRID as keys in that order
  • a filter where SRID > 0
  • Status as an included column

This means that the EXISTS statement is SARGable.

However, depending on how many records are in @InMemoryTableTypeTable and what mood SQL Server seems to be in, sometimes the index seek will only seek on WTID and push the rest of the predication up into the Left Anti Semi Join. If this happens and memory on the SQL Server itself is under pressure, the query can sit there for 20 minutes or so. For some values of @WTID there could be 1 row or there could be 200k that have just been inserted earlier in the same session.

Here is the good plan: https://www.brentozar.com/pastetheplan/?id=H1-V_Jz7R

Here is the bad plan: https://www.brentozar.com/pastetheplan/?id=SJD-QZGQA

Is there a way to force SQL Server to apply the predication to all 3 columns in the Index Seek every time?

I have tried breaking this out of the IF and using the OPTIMIZE FOR UNKNOWN and OPTIMIZE FOR (@WTID UNKNOWN) hints to no avail.


The seek is more for concurrency: the reads and writes within that table for each session will be segregated by WTID. However, removing those table hints makes no difference, it always scans t and seeks into m, it's the position of the OETID and SRID predication that appears to make the difference.

This post Actual and Estimated rows differ greatly led me to the ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES hint which produces the plan I want (most of the time) along with RECOMPILE. Combining this with FORCE_LEGACY_CARDINALITY_ESTIMATION reverts to the "wrong" plan.

Best Answer

Solution

You can use an expanded FORCESEEK hint with the seek keys needed:

FROM dbo.MyTable AS M
    WITH (FORCESEEK(IDX(WTID, OETID, SRID)))

For example:

IF EXISTS 
(
    SELECT T.*
    FROM
    (
        SELECT I.RowID, I.OETID
        FROM @InMemoryTableTypeTable AS I
        UNION
        SELECT D.RowID, D.OETID
        FROM @InMemoryTableTypeTable AS D
    ) AS T
    WHERE    
        NOT EXISTS 
        (
            SELECT M.*
            FROM dbo.MyTable AS M 
                WITH 
                (
                    FORCESEEK(IDX(WTID, OETID, SRID)), -- !CHANGED!
                    ROWLOCK,
                    UPDLOCK
                )
            WHERE    
                M.OETID = T.RowID
                AND M.SRID = T.OETID
                AND M.WTID = @WTID
                AND M.[Status] <> 1
                AND M.SRID > 0
        )
)
...

Explanation

As you noticed, the cause is cost estimation. Queries introduced by EXISTS come with a row goal, which complicates matters. Coupled with a query that contains several other hard-to-estimate features, this is the sort of situation where a hint may be required to get the desired plan shape consistently.

Costing and exploration

The 'good' and 'bad' plans both have very low total estimated costs, so the optimizer doesn't spend much time searching (note the Good Enough Plan Found reason for early termination at the root operator).

The optimizer considers many alternatives you don't see reflected in the final plan. Without the low cost found early, it would go on to consider further strategies like the one illustrated below that pushes the anti semi join (as an apply) below the union:

anti semi join below union all

Regardless, the optimizer isn't 'confused by the UNION'—it just didn't get as far as considering an APPLY, only a join (which could be implemented as nested loops, hash, or merge).

The nested loops join plan does have a seek on the inner side, but that's the uncorrelated predicate WTID = @WTID, which could also feature in a hash or merge join. The remaining predicates are all correlated, so they would require an apply to push down. See my article, Apply versus Nested Loops Join if the concepts there are unclear.

IF EXISTS

Ordinarily, you could use a OPTION (USE HINT ('DISABLE_OPTIMIZER_ROWGOAL')) query hint to disable the row goal behaviour, which would very likely reliably produce the desired plan as well.

Unfortunately, query hints only apply to the top-level query (IF EXISTS here) and not the nested query (the one you're concerned about). You did see different plans when adding hints, but this was because the query text was different and so required a fresh compilation.

Using IF EXISTS, you'd need to set documented trace flag 4138 to disable the row goal. It would have to be set at session level (using DBCC TRACEON), because QUERYTRACEON also only applies to the top level.

You can avoid this non-obvious behaviour by using a pattern like the following, instead of IF EXISTS (see the related Q & A below)

DECLARE @Exists bit;

SELECT @Exists =
    IIF
    (
        EXISTS
        (
            --
            -- The query without IF EXISTS
            --
        ),
        CONVERT(bit, 'true'),
        CONVERT(bit, 'false')
    )
-- Query hints are effective now
OPTION (USE HINT ('DISABLE_OPTIMIZER_ROWGOAL'));

IF @Exists = CONVERT(bit, 'true')
BEGIN
    --
    -- Thing to do IF EXISTS
    --
END;

I mention all that for interest. Since you're already using multiple hints, expanding FORCESEEK is the way to go.


Related Q & A: