SQL Server Temp Table Index Seek vs Scan Based on Row Count

execution-planquery-performancesql server

I have the following table and data:

CREATE TABLE myTable (
    ID INT IDENTITY(1,1) PRIMARY KEY,
    Column1 VARCHAR(50),
    Column2 VARCHAR(50),
    Column3 VARCHAR(50),
    Column4 VARCHAR(50),
    Column5 VARCHAR(50),
    Column6 VARCHAR(50),
    Column7 VARCHAR(50),
    Column8 VARCHAR(50),
    Column9 VARCHAR(50),
    Column10 VARCHAR(50)
)

DECLARE @i INT = 1
DECLARE @j INT = 1
DECLARE @distinct_value_count INT = 20
DECLARE @distinct_value_count_with_more_rows INT = 3
DECLARE @rows_per_distinct_value INT = (20000 - (@distinct_value_count_with_more_rows * 2000)) / (@distinct_value_count - @distinct_value_count_with_more_rows)

WHILE @i <= @distinct_value_count
BEGIN
    DECLARE @current_rows_per_value INT = @rows_per_distinct_value
    IF @i <= @distinct_value_count_with_more_rows
    BEGIN
        SET @current_rows_per_value = @rows_per_distinct_value + 2000
    END
    
    SET @j = 1
    WHILE @j <= @current_rows_per_value
    BEGIN
        INSERT INTO myTable (Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10)
        VALUES ('Value' + CAST(@i AS VARCHAR(2)),
                'Value' + CAST(@j AS VARCHAR(5)),
                'Value' + CAST(@j + 1 AS VARCHAR(5)),
                'Value' + CAST(@j + 2 AS VARCHAR(5)),
                'Value' + CAST(@j + 3 AS VARCHAR(5)),
                'Value' + CAST(@j + 4 AS VARCHAR(5)),
                'Value' + CAST(@j + 5 AS VARCHAR(5)),
                'Value' + CAST(@j + 6 AS VARCHAR(5)),
                'Value' + CAST(@j + 7 AS VARCHAR(5)),
                'Value' + CAST(@j + 8 AS VARCHAR(5)))
        SET @j = @j + 1
    END
    
    SET @i = @i + 1
END

Alter Table dbo.myTable
Add Column11 varchar(50), Column12 varchar(50)

Alter Table dbo.myTable
Add dateModified datetime

Update dbo.myTable
  set Column11 = Column1
     ,Column12 = Column1

Update Top (10) dbo.myTable
   Set Column11 = 'Value7'
  Where Column1 = 'Value1'

Update Top (10) dbo.myTable
   Set Column12 = 'Value7'
  Where Column1 = 'Value1'

Update Top (10) dbo.myTable
   Set Column11 = 'Value6'
  Where Column1 = 'Value1'

Update Top (10) dbo.myTable
   Set Column12 = 'Value6'
  Where Column1 = 'Value1'

Update Top (10) dbo.myTable
   Set Column11 = 'Value5'
  Where Column1 = 'Value1'

Update Top (10) dbo.myTable
   Set Column12 = 'Value5'
  Where Column1 = 'Value1'

Update dbo.myTable
  set dateModified = getdate() + ID

CREATE NONCLUSTERED INDEX [Idx_col] ON [dbo].[myTable]
(
    [Column1] ASC,
    [Column11] ASC,
    [Column12] ASC,
    [dateModified] ASC
)
INCLUDE([Column5],[Column6]) 

I have to filter based on a few columns and return all the columns from the table. In order to do that, I have an index that covers the columns that need to be filtered. I'm breaking down the query into two parts:

  1. Get all the primary key rows that satisfy the filter and store them in a temp table. This query makes use of the non-clustered index.

  2. Join this temp table back to main table on the primary key column so that the clustered index is used to get all the columns.

However, i'm facing an issue when I try to do this. In the first scenario I'm getting all the filtered rows into a temp table and then when I join it back to main table, it's doing a clustered index scan. In the second scenario, I'm only getting the top 50 rows into temp table and when I join this to the main table, it's doing a clustered index seek. I'm confused as to why this is happening. In both cases, there is no index on the temp table. I would appreciate it if anyone can help me understand what's going on. Thank you!

Scenario 1:

SELECT id
INTO #tmpId
FROM myTable
WHERE Column1= 'Value1'
AND Column11 In( 'Value1','Value5','Value6', 'Value7')
And Column12 In ('Value1','Value6')
And dateModified > dateAdd(day,-5, getdate())

SELECT *
FROM myTable m
JOIN #tmpId t
ON m.id = t.id

drop table if exists #tmpId

Execution Plan Scenario 1: https://www.brentozar.com/pastetheplan/?id=rkDAD-aLh

Scenario 2:

SELECT id
INTO #tmpId
FROM myTable
WHERE Column1= 'Value1'
AND Column11 In( 'Value1','Value5','Value6', 'Value7')
And Column12 In ('Value1','Value6')
And dateModified > dateAdd(day,-5, getdate())
Order by dateModified desc offset 0 rows fetch next 50 rows only

SELECT *
FROM myTable m
JOIN #tmpId t
ON m.id = t.id

drop table if exists #tmpId

Scenario 2 Execution Plan: https://www.brentozar.com/pastetheplan/?id=rJVbuWaLh

Best Answer

Index seek is beneficial for retrieving a relatively small amount of data. And it can drastically slow down a query that retrieves a large number of rows. Sometimes inapropriate use of index seek can slow down a query for hours. That's why information about a number of rows to be selected is crucial for query optimization.

When you select 50 rows index seek works perfectly. But it looks like your data for 5 days is too large for index seek to be faster than index scan. That's why optimizer uses index scan this time.

You can also try using FORCESEEK and FORCESCAN optimizer hints to monitor the difference.