Sql-server – SQL Server doesn’t use expected index

execution-planindexsql server

I have the following query (simplified from an Entity Framework query, I don't have direct control of the SQL):

SELECT TOP 15
HouseHoldingRuns.Id,
(
    SELECT TOP 1 ManCoId FROM (
        SELECT TOP 1 GridRuns.Id
        FROM GridRuns
        WHERE HouseHoldingRuns.Id = GridRuns.HouseHoldingRunId
    ) AS [FirstGridRun]
    INNER JOIN Documents ON FirstGridRun.Id = Documents.GridRunId
    ORDER BY Documents.Id
)
FROM HouseHoldingRuns
ORDER BY HouseHoldingRuns.Id

Which produces the following plan:

Without hint
Adding a WITH (FORCESEEK) on the Documents table results in the following plan, which performs much better.

With hint

Why is SQL Server getting it so wrong? I'm guessing the problem lies in the large difference between estimated and actual rows, but I don't know how to fix that… (I've tried updating all stats and rebuilding indexes).

Note: I don't really want the ORDER BY Documents.Id in there, but put it in for Evgeny to see the results of his suggestion.

Best Answer

In the version of the query with FORCESEEK, the majority of the query cost is in the key lookup. You can eliminate this key lookup by creating a nonclustered index on GridRunId and MancoId.

Have a look at the estimated cost figure in the SELECT (root) node of your plans. The optimizer will chose the plan with the lowest estimated cost. By adding a covering nonclustered index, you eliminate the need for the key lookup. This should make the estimated cost of the index seek less than the index scan. You might also want to run the query with OPTION (RECOMPILE) to make sure it uses a fresh execution plan. The index could look something like this:

CREATE NONCLUSTERED INDEX [IX_GridRunId_MancoId] ON [dbo].[Documents]
(
    [GridRunId] ASC,
    [MancoId] ASC
)
WITH 
(
    PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, 
    DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
)
ON [PRIMARY];

The optimizer would ordinarily cost the scan higher than the seek, but the effect of the TOP means the optimizer computes costs for the desired number of rows (a 'row goal') rather than for the whole set (it estimates that a partial scan will locate the 15 desired rows more efficiently than the seek + lookup combination).