Sql-server – Why does Sql Server makes this bad query plan choice

execution-plansql serversql server 2014

Sql Server gives me a bad query plan and I'm trying to understand why that is.

The query is this:

SELECT TOP (10) 
    [Project1].[C1] AS [C1], 
    [Project1].[Id] AS [Id], 
    [Project1].[SupplierNumber] AS [SupplierNumber], 
    [Project1].[ArticleNumber] AS [ArticleNumber], 
    [Project1].[ArticleName] AS [ArticleName]
    FROM ( SELECT 
        [Extent1].[SupplierNumber] AS [SupplierNumber], 
        [Extent1].[ArticleNumber] AS [ArticleNumber], 
        [Extent1].[Id] AS [Id], 
        [Extent1].[ArticleName] AS [ArticleName], 
        1 AS [C1]
        FROM  [dbo].[SalesEntry] AS [Extent1]
        LEFT OUTER LOOP JOIN [dbo].[Article] AS [Extent2]
            ON ([Extent1].[ArticleNumber] = [Extent2].[ArticleNumber])
                AND ([Extent1].[SupplierNumber] = [Extent2].[SupplierNumber])
        WHERE [Extent2].[id] IS NULL
    )  AS [Project1]
    ORDER BY [Project1].[SupplierNumber] ASC, [Project1].[ArticleNumber] ASC
  OPTION (TABLE HINT ([Extent1], INDEX(IX_Main)))

I have already annotated the query with two hints:

  • the join is forced to be a loop join and
  • I force an index that fits the order by criteria.

With these hints, I get an efficient query plan that looks like this:

Scan index IX_Main on SalesEntry and for each 10 entries, look up the respective article entries with IX_Main on Articles.

Both tables have an IX_Main index on (SupplierNumber, ArticleNumber).

That way the query is fast.

Without the hints, however, Sql Server does a clustered index scan on SalesEntry, which is not useful at all, and an index scan on IX_Main for Article, and then brings the two streams together with a hash match.

That's not so fast, particularly because all the rows of SalesEntry now need to be scanned although we're only interested in the top 10 regarding IX_Main.

I'm confused as to why Sql Server would make that decision.

There's a TOP 10 specifier. That should tell Sql Server that it can get enough rows for the result super-fast with the index it chooses to ignore (IX_Main). It then would need to do only a lousy ten lookups with the index IX_Main on Article.

I already tried and failed to reduce this to a simple example that can be reproduced, so I'm putting this with as much information out there as I think it relevant.

Does anyone have an idea about what Sql Server's thought process might be?

(The query looks a bit weird as it is based on what my ORM, Entity Framework produces.)

EDIT: Here's the problematic plan as xml in a gist.

Best Answer

That top being kind of removed from the order by makes is hard for the query optimizer
It does not need to just do a lousy 10 lookups as it need top 10 WHERE [Extent2].[id] IS NULL
With more statistics the query optimizer may get smarter
I know you are using an ORM but give this a try

SELECT  Top(10) 1 AS [C1],
        [Extent1].[SupplierNumber] AS [SupplierNumber], 
        [Extent1].[ArticleNumber]  AS [ArticleNumber], 
        [Extent1].[Id] AS [Id], 
        [Extent1].[ArticleName] AS [ArticleName]
        FROM [dbo].[SalesEntry] AS [Extent1]
        LEFT OUTER JOIN [dbo].[Article] AS [Extent2]
            ON [Extent1].[ArticleNumber]  = [Extent2].[ArticleNumber]
           AND [Extent1].[SupplierNumber] = [Extent2].[SupplierNumber]
WHERE [Extent2].[id] IS NULL
ORDER BY [Extent1].[SupplierNumber] ASC, [Extent1].[ArticleNumber] ASC