SQL Server – Why Index Scan is Preferred Over Seek

execution-plansql server

Is how my table looks:

    CREATE TABLE [dbo].[ClosedTaskCustomFields](
    [ClosedTaskId] [uniqueidentifier] NOT NULL,
    [CustomFieldId] [uniqueidentifier] NOT NULL,
    [Value] [nvarchar](450) NULL
    ... 
 CONSTRAINT [PK_ClosedTaskCustomFields] PRIMARY KEY CLUSTERED 
(
    [ClosedTaskId] ASC,
    [CustomFieldId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

And also I have such index for joining ClosedTask table:

CREATE NONCLUSTERED INDEX [IX_ClosedTaskCustomFields_ClosedTaskId] ON [dbo].[ClosedTaskCustomFields]
(
    [ClosedTaskId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

But I can't get why such query reads all ClosedTaskCustomFields table to join ClosedTask if I have index.

      select count(1) from ClosedTaskCustomFields ctcf
      join ClosedTask c on c.id = ctcf.ClosedTaskId
      where c.State = 'Rejected'
-- Result count is 50k.

plan: https://www.brentozar.com/pastetheplan/?id=SJ9Y4xFEp
enter image description here

Best Answer

SQL Server chooses the plan that seems cheapest based on its estimates. That means the plan you have in mind, with an index seek on both tables, would be costed higher. You can test this for yourself using a FORCESEEK hint.

Note that a higher estimated cost does not necessarily mean the plan would execute more slowly than the chosen alternative on your particular hardware.

There are many reasons the plan shown is costed lower:

  1. The optimizer assumes all queries start with no data in cache.
  2. Sequential I/O is costed lower than random I/O.
  3. Seeks into a b-tree are assumed to be largely random, with an ongoing reduction when seeks are repeated. This attempts to account for the possibility of touching pages previously brought into memory from persistent storage for the same query.
  4. SQL Server does not reduce costs on the inner side of a nested loops join when parallelism is used. This tends to disfavour parallel nested loops plans.
  5. Your plan features a bitmap. This is used to reject rows that cannot join during the scan. It is the reason the scan touches 1,169,875 rows but only returns 51,307 (actual number of rows for all executions).
  6. A hash join has the lowest per-row processing cost of the available join types. The extra cost of building the hash table is offset by this benefit, as well as those provided by the bitmap (not available with nested loops).

You might find the optimizer naturally chooses the seek plan with a MAXDOP 1 hint, or a higher cost threshold for parallelism in general. Seems like you have that set to the default value of 5, which many people consider too low these days.

Otherwise, you're left with the normal way to override optimizer choices—using a hint like FORCESEEK. I wouldn't bother unless the query is crucial and saving a few extra milliseconds is important.

For more on row-mode bitmaps, see my article Bitmap Magic (or… how SQL Server uses bitmap filters).