Sql-server – SQL Server Indexed View and TOP

execution-planquery-performancesql serverview

I'm struggling to persuade a query plan to behave as I think it should. The addition of a TOP clause when querying an indexed view is causing a sub-optimal plan, and I'm hoping for some help in sorting it.

Environment

  • SQL Server 2019
  • StackOverflow2013 database (50GB version), Compat Mode 150 (problem is not specific to this version)

The setup:

Firstly, I've created a view to return everyone with a high reputation:

CREATE VIEW vwHighReputation
WITH SCHEMABINDING
AS
SELECT  [Id],
        [DisplayName],
        [Reputation]
FROM    [dbo].[Users]
WHERE   [Reputation] > 10000

Next, since I'll be searching by display name, I've created a couple of indexes on the view:

CREATE UNIQUE CLUSTERED INDEX IX_Users_Id ON [dbo].[vwHighReputation]([Id])
GO
CREATE NONCLUSTERED INDEX IX_Users_DisplayName ON [dbo].[vwHighReputation]([DisplayName]) INCLUDE (Reputation)
GO

If I query via the view, I can see my nonclustered index is being used:

SELECT  *
FROM    [dbo].[vwHighReputation]
WHERE   [DisplayName] LIKE 'J%'

Plan: (https://www.brentozar.com/pastetheplan/?id=Sy2EoJaiv)

So far so good. I can even use my view as part of a more complex query with an OUTER APPLY, and I still get a seek with only 63 reads against my index (this is obviously a contrived example, but helps illustrate the problem that I'll come to):

SELECT  [U].[Id],
        [A].[Reputation],
        [A].[DisplayName]
FROM    [dbo].[Users] AS [U]
        OUTER APPLY (
                        SELECT  *
                        FROM    [dbo].[vwHighReputation] AS [v]
                         WHERE   [v].[Id] = [U].[Id]
                    ) AS [A]
WHERE   [A].[DisplayName] LIKE 'J%'; 

Plan: https://www.brentozar.com/pastetheplan/?id=HJaw3y6ov

However, if I add a TOP 1 to my OUTER APPLY:

SELECT  [U].[Id],
        [A].[Reputation],
        [A].[DisplayName]
FROM    [dbo].[Users] AS [U]
        OUTER APPLY (
                        SELECT  TOP 1 *
                        FROM    [dbo].[vwHighReputation] AS [v]
                        WHERE   [v].[Id] = [U].[Id]
                    ) AS [A]
WHERE   [A].[DisplayName] LIKE 'J%';

Then the situation gets bad….very, very bad….

Plan: https://www.brentozar.com/pastetheplan/?id=HyOS6yaiw

My logical read count against that view is now almost 5 million. I can see from the plan that SQL Server is now choosing to perform a seek on the clustered index with the User's ID as the predicate, but doing so around 2.5 million times. It is also scanning the whole of the Users table. It no longer seeks on the view's index.

Obviously the optimiser is deciding that this is the most efficient approach, but I can't understand why! I've think it's probably to do with the way the underlying tables are sorted, but I'm not sure.

Incidentally, re-writing it as a simple SUB QUERY rather than CROSS APPLY yeilds the same outcome.

Any help or advice would be great!

Best Answer

Outer Apply

You're using OUTER APPLY, but with a where clause that would reject NULL values.

It's converted to an inner join without the TOP (1):

SELECT  
    U.Id,
    A.Reputation,
    A.DisplayName
FROM dbo.Users AS U
OUTER APPLY 
(
    SELECT  
        v.*
    FROM dbo.vwHighReputation AS v
    WHERE v.Id = U.Id
) AS A
WHERE A.DisplayName LIKE 'J%'
ORDER BY U.Id;

NUTS

I've formatted your code a little bit, and added an ORDER BY to validate results across queries. No offense.

Outer Apply + TOP (1)

When you use the TOP (1), the join is of the LEFT OUTER variety:

SELECT  
    U.Id,
    A.Reputation,
    A.DisplayName
FROM dbo.Users AS U
OUTER APPLY 
(
    SELECT TOP (1)
        v.*
    FROM dbo.vwHighReputation AS v
    WHERE v.Id = U.Id
) AS A
WHERE A.DisplayName LIKE 'J%'
ORDER BY U.Id;

NUTS

The TOP (1) inside the OUTER APPLY apparently makes the optimizer unable to apply the same transformation to an inner join, even with a redundant predicate:

SELECT  
    U.Id,
    A.Reputation,
    A.DisplayName
FROM dbo.Users AS U
OUTER APPLY 
(
    SELECT TOP (1)
        v.*
    FROM dbo.vwHighReputation AS v
    WHERE v.Id = U.Id
    AND   v.DisplayName LIKE 'J%'
) AS A
WHERE A.DisplayName LIKE 'J%'
ORDER BY U.Id;

NUTS

Note the residual predicates to evaluate if the Id and DisplayName columns are NULL.

This isn't just a TOP (1) issue either -- you can substitute any values up to the big int max (9223372036854775807) and see the same plan.

It will also happen if you skip the view entirely.

SELECT  
    U.Id,
    A.Reputation,
    A.DisplayName
FROM dbo.Users AS U
OUTER APPLY 
(
    SELECT TOP (1)
        v.Id,
        v.DisplayName,
        v.Reputation
    FROM dbo.Users AS v
    WHERE v.Reputation > 10000 
    AND   v.Id = U.Id
) AS A
WHERE A.DisplayName LIKE 'J%'
ORDER BY U.Id
OPTION(EXPAND VIEWS);

A Rewrite

One way to get the same effect as TOP (1) without the various optimizer side effects of TOP is to use ROW_NUMBER

SELECT  
    U.Id,
    A.Reputation,
    A.DisplayName
FROM dbo.Users AS U
OUTER APPLY 
(
    SELECT
        v.*
    FROM
    (
        SELECT 
            v.*,
            ROW_NUMBER() OVER 
            (
                PARTITION BY 
                    v.Id
                ORDER BY
                    v.Id
            ) AS n
        FROM dbo.vwHighReputation AS v
    ) AS v
    WHERE v.Id = U.Id
    AND   v.n = 1
) AS A
WHERE A.DisplayName LIKE 'J%'
ORDER BY U.Id;

Which will get you the original plan:

NUTS