Sql-server – Why does Sql Server makes this bad query plan choice

execution-plansql serversql server 2014

Sql Server gives me a bad query plan and I'm trying to understand why that is.

The query is this:

SELECT TOP (10) 
    [Project1].[C1] AS [C1], 
    [Project1].[Id] AS [Id], 
    [Project1].[SupplierNumber] AS [SupplierNumber], 
    [Project1].[ArticleNumber] AS [ArticleNumber], 
    [Project1].[ArticleName] AS [ArticleName]
    FROM ( SELECT 
        [Extent1].[SupplierNumber] AS [SupplierNumber], 
        [Extent1].[ArticleNumber] AS [ArticleNumber], 
        [Extent1].[Id] AS [Id], 
        [Extent1].[ArticleName] AS [ArticleName], 
        1 AS [C1]
        FROM  [dbo].[SalesEntry] AS [Extent1]
        LEFT OUTER LOOP JOIN [dbo].[Article] AS [Extent2]
            ON ([Extent1].[ArticleNumber] = [Extent2].[ArticleNumber])
                AND ([Extent1].[SupplierNumber] = [Extent2].[SupplierNumber])
        WHERE [Extent2].[id] IS NULL
    )  AS [Project1]
    ORDER BY [Project1].[SupplierNumber] ASC, [Project1].[ArticleNumber] ASC
  OPTION (TABLE HINT ([Extent1], INDEX(IX_Main)))

I have already annotated the query with two hints:

the join is forced to be a loop join and
I force an index that fits the order by criteria.

With these hints, I get an efficient query plan that looks like this:

Scan index IX_Main on SalesEntry and for each 10 entries, look up the respective article entries with IX_Main on Articles.

Both tables have an IX_Main index on (SupplierNumber, ArticleNumber).

That way the query is fast.

Without the hints, however, Sql Server does a clustered index scan on SalesEntry, which is not useful at all, and an index scan on IX_Main for Article, and then brings the two streams together with a hash match.

That's not so fast, particularly because all the rows of SalesEntry now need to be scanned although we're only interested in the top 10 regarding IX_Main.

I'm confused as to why Sql Server would make that decision.

There's a TOP 10 specifier. That should tell Sql Server that it can get enough rows for the result super-fast with the index it chooses to ignore (IX_Main). It then would need to do only a lousy ten lookups with the index IX_Main on Article.

I already tried and failed to reduce this to a simple example that can be reproduced, so I'm putting this with as much information out there as I think it relevant.

Does anyone have an idea about what Sql Server's thought process might be?

(The query looks a bit weird as it is based on what my ORM, Entity Framework produces.)

EDIT: Here's the problematic plan as xml in a gist.

Best Answer

That top being kind of removed from the order by makes is hard for the query optimizer
It does not need to just do a lousy 10 lookups as it need top 10 WHERE [Extent2].[id] IS NULL
With more statistics the query optimizer may get smarter
I know you are using an ORM but give this a try

SELECT  Top(10) 1 AS [C1],
        [Extent1].[SupplierNumber] AS [SupplierNumber], 
        [Extent1].[ArticleNumber]  AS [ArticleNumber], 
        [Extent1].[Id] AS [Id], 
        [Extent1].[ArticleName] AS [ArticleName]
        FROM [dbo].[SalesEntry] AS [Extent1]
        LEFT OUTER JOIN [dbo].[Article] AS [Extent2]
            ON [Extent1].[ArticleNumber]  = [Extent2].[ArticleNumber]
           AND [Extent1].[SupplierNumber] = [Extent2].[SupplierNumber]
WHERE [Extent2].[id] IS NULL
ORDER BY [Extent1].[SupplierNumber] ASC, [Extent1].[ArticleNumber] ASC

Related Solutions

Sql-server – SHOWPLAN does not display a warning but “Include Execution Plan” does for the same query

This:

SET SHOWPLAN_XML ON;
GO
SELECT * FROM sys.objects;
GO

Is equivalent to pressing Display Estimated Execution Plan on the toolbar (or hitting Ctrl + L). You'll notice that no rows are returned from the query, like there is when you use Include Actual Execution Plan (Ctrl + M).

The spill warning is only a runtime warning. There is no way that SQL Server can know, when displaying the estimated plan, that a spill will happen at runtime. This is because a spill is caused by factors that might only be present during certain invocations of the query (for example, when there is memory pressure). The estimated plan knows roughly how much memory it's going to ask for, but it can't know until execution that it isn't going to get it.

As an aside, may I recommend* our free tool, SQL Sentry Plan Explorer? I think it provides much more obvious information than Management Studio. I recently wrote a lengthy blog post that can act as a tutorial, and Jonathan Kehayias has a great PluralSight course on it as well.

_{* Disclaimer: I work for SQL Sentry.}

Sql-server – Why isn’t the primary (clustered) key being used in this query

The clustered index is partitioned on ReadTime so it couldn't use the PK as you describe. It would need to find the Max(Id) for each partition and then find the max of those. It is possible to rewrite the query to get such a plan however.

Using an example based on the article here a possible rewrite might be

SELECT MAX(ID) AS ID
FROM   sys.partitions AS P
       CROSS APPLY (SELECT MAX(ID) AS ID
                    FROM   [dbo].[CDSIM_BE]
                    WHERE  $PARTITION.MonthlyArchiveFunction9(ReadTime) 
                                                    = P.partition_number) AS A
WHERE  P.object_id = OBJECT_ID('dbo.CDSIM_BE')
       AND P.index_id <= 1;

To process each partition in turn.

Note the plan still has a scan (with a seek predicate to select the partition) but this is not a full scan of the partition.

The scan is in index order with direction "BACKWARD". The TOP iterator can stop requesting rows from the scan after the first one is received.

enter image description here

Best Answer

Related Solutions

Sql-server – SHOWPLAN does not display a warning but “Include Execution Plan” does for the same query

Sql-server – Why isn’t the primary (clustered) key being used in this query

Related Question