I'm struggling to persuade a query plan to behave as I think it should. The addition of a TOP clause when querying an indexed view is causing a sub-optimal plan, and I'm hoping for some help in sorting it.
Environment
- SQL Server 2019
- StackOverflow2013 database (50GB version), Compat Mode 150 (problem is not specific to this version)
The setup:
Firstly, I've created a view to return everyone with a high reputation:
CREATE VIEW vwHighReputation
WITH SCHEMABINDING
AS
SELECT [Id],
[DisplayName],
[Reputation]
FROM [dbo].[Users]
WHERE [Reputation] > 10000
Next, since I'll be searching by display name, I've created a couple of indexes on the view:
CREATE UNIQUE CLUSTERED INDEX IX_Users_Id ON [dbo].[vwHighReputation]([Id])
GO
CREATE NONCLUSTERED INDEX IX_Users_DisplayName ON [dbo].[vwHighReputation]([DisplayName]) INCLUDE (Reputation)
GO
If I query via the view, I can see my nonclustered index is being used:
SELECT *
FROM [dbo].[vwHighReputation]
WHERE [DisplayName] LIKE 'J%'
Plan: (https://www.brentozar.com/pastetheplan/?id=Sy2EoJaiv)
So far so good. I can even use my view as part of a more complex query with an OUTER APPLY, and I still get a seek with only 63 reads against my index (this is obviously a contrived example, but helps illustrate the problem that I'll come to):
SELECT [U].[Id],
[A].[Reputation],
[A].[DisplayName]
FROM [dbo].[Users] AS [U]
OUTER APPLY (
SELECT *
FROM [dbo].[vwHighReputation] AS [v]
WHERE [v].[Id] = [U].[Id]
) AS [A]
WHERE [A].[DisplayName] LIKE 'J%';
Plan: https://www.brentozar.com/pastetheplan/?id=HJaw3y6ov
However, if I add a TOP 1 to my OUTER APPLY:
SELECT [U].[Id],
[A].[Reputation],
[A].[DisplayName]
FROM [dbo].[Users] AS [U]
OUTER APPLY (
SELECT TOP 1 *
FROM [dbo].[vwHighReputation] AS [v]
WHERE [v].[Id] = [U].[Id]
) AS [A]
WHERE [A].[DisplayName] LIKE 'J%';
Then the situation gets bad….very, very bad….
Plan: https://www.brentozar.com/pastetheplan/?id=HyOS6yaiw
My logical read count against that view is now almost 5 million. I can see from the plan that SQL Server is now choosing to perform a seek on the clustered index with the User's ID as the predicate, but doing so around 2.5 million times. It is also scanning the whole of the Users table. It no longer seeks on the view's index.
Obviously the optimiser is deciding that this is the most efficient approach, but I can't understand why! I've think it's probably to do with the way the underlying tables are sorted, but I'm not sure.
Incidentally, re-writing it as a simple SUB QUERY rather than CROSS APPLY yeilds the same outcome.
Any help or advice would be great!
Best Answer
Outer Apply
You're using
OUTER APPLY
, but with a where clause that would rejectNULL
values.It's converted to an inner join without the
TOP (1)
:I've formatted your code a little bit, and added an
ORDER BY
to validate results across queries. No offense.Outer Apply + TOP (1)
When you use the
TOP (1)
, the join is of theLEFT OUTER
variety:The
TOP (1)
inside theOUTER APPLY
apparently makes the optimizer unable to apply the same transformation to an inner join, even with a redundant predicate:Note the residual predicates to evaluate if the
Id
andDisplayName
columns areNULL
.This isn't just a
TOP (1)
issue either -- you can substitute any values up to the big int max (9223372036854775807) and see the same plan.It will also happen if you skip the view entirely.
A Rewrite
One way to get the same effect as
TOP (1)
without the various optimizer side effects ofTOP
is to useROW_NUMBER
Which will get you the original plan: