Sql-server – Why does this Index seek cause a lot more reads than index scan

indexindex-tuningsql server

As I've read that index seeks most of the time are preferred to index scans, I'm trying some stuff out.

I have a query that does 993 reads (checked with SQL Profiler) when it's using an index scan. When using an index seek it needs 44.347 reads. Feels like something is wrong, or I don't understand it.

This is the query for index scan:

select      t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 on t5.Table4Id = t4.Id

this is the query for index seek:

select      t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 WITH (FORCESEEK) on t5.Table4Id = t4.Id

The tables are simple and straight forward. At the end I fill them with some dummy data, so it can be reproduced easily.

CREATE TABLE [dbo].[table1](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](max) NOT NULL,
CONSTRAINT [PK_table1] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO




CREATE TABLE [dbo].[table2](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Table1Id] [bigint] NOT NULL,
CONSTRAINT [PK_table2] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE     [dbo].[table2] WITH CHECK
ADD CONSTRAINT  [FK_table2_table1Id] FOREIGN KEY([table1Id])
REFERENCES      [dbo].[table1] ([Id])
GO

ALTER TABLE         [dbo].[table2]
CHECK CONSTRAINT    [FK_table2_table1Id]
GO


CREATE NONCLUSTERED INDEX [IdxTable2_FKTable1Id] ON [dbo].[table2]
(
    [Table1Id] ASC
)
INCLUDE (   [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO






CREATE TABLE [dbo].[table3](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Table2Id] [bigint] NOT NULL,
CONSTRAINT [PK_table3] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE     [dbo].[table3] WITH CHECK
ADD CONSTRAINT  [FK_table3_table2Id] FOREIGN KEY([table2Id])
REFERENCES      [dbo].[table2] ([Id])
GO

ALTER TABLE         [dbo].[table3]
CHECK CONSTRAINT    [FK_table3_table2Id]
GO


CREATE NONCLUSTERED INDEX [IdxTable3_FKTable2Id] ON [dbo].[table3]
(
    [Table2Id] ASC
)
INCLUDE (   [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO







CREATE TABLE [dbo].[table4](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Table3Id] [bigint] NOT NULL,
CONSTRAINT [PK_table4] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE     [dbo].[table4] WITH CHECK
ADD CONSTRAINT  [FK_table4_table3Id] FOREIGN KEY([table3Id])
REFERENCES      [dbo].[table4] ([Id])
GO

ALTER TABLE         [dbo].[table4]
CHECK CONSTRAINT    [FK_table4_table3Id]
GO

CREATE NONCLUSTERED INDEX [IdxTable4_FKTable3Id] ON [dbo].[table4]
(
    [Table3Id] ASC
)
INCLUDE (   [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO





CREATE TABLE [dbo].[table5](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Table4Id] [bigint] NOT NULL,
    [Description] [nvarchar](2000) NOT NULL,
CONSTRAINT [PK_table5] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE     [dbo].[table5] WITH CHECK
ADD CONSTRAINT  [FK_table5_table4Id] FOREIGN KEY([table4Id])
REFERENCES      [dbo].[table5] ([Id])
GO

ALTER TABLE         [dbo].[table5]
CHECK CONSTRAINT    [FK_table5_table4Id]
GO

CREATE NONCLUSTERED INDEX [IdxTable5_FKTable4Id] ON [dbo].[table5]
(
    [Table4Id] ASC
)
INCLUDE (   [Id], [Description]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


set nocount on

DECLARE @i INT = 0;
DECLARE @j INT = 0;
DECLARE @k INT = 0;
DECLARE @l INT = 10;
DECLARE @m INT = 0;

declare @table1Id bigint
declare @table2Id bigint
declare @table3Id bigint
declare @table4Id bigint


begin tran
WHILE @i < 10
BEGIN
    INSERT INTO [dbo].[table1] ([Name]) VALUES (cast(@i as nvarchar(10)))
    SELECT @table1Id = SCOPE_IDENTITY()

    WHILE @j < 10
    BEGIN
        INSERT INTO [dbo].[table2] ([Table1Id]) VALUES (@table1Id)
        SELECT @table2Id = SCOPE_IDENTITY()

        WHILE @k < 10
        BEGIN
            INSERT INTO [dbo].[table3] ([Table2Id]) VALUES (@table2Id)
            SELECT @table3Id = SCOPE_IDENTITY()

            WHILE @l > 0
            BEGIN
                INSERT INTO [dbo].[table4] ([Table3Id]) VALUES (@table3Id)
                SELECT @table4Id = SCOPE_IDENTITY()

                WHILE @m < 10
                BEGIN
                    INSERT INTO [dbo].[table5] ([Table4Id], [Description]) VALUES (@table4Id, 'Not so long description')

                    SET @m = @m + 1;
                END;

                SET @m = 0;
                SET @l = @l - 1;
            END;

            SET @l = 10;
            SET @k = @k + 1;
        END;

        SET @k = 0;
        SET @j = @j + 1;
    END;

    SET @j = 0;
    SET @i = @i + 1;
END;
commit

Best Answer

First let's get an estimate of the number of reads required to do an index scan for IdxTable5_FKTable4Id:

-- get an idea of number of pages in the index
SELECT *
FROM sys.dm_db_partition_stats s
WHERE OBJECT_NAME(s.object_id) IN ('table5')
AND index_id > 1;

On my system the results of that query suggest that SQL Server will need around 900 reads to read the index in full. To test this, I'll run a simple query that is most efficiently implemented as an index scan in SQL Server. The query below needs the ID column for all rows in table5. SQL Server can just look at the index to get all of the data required for the query. Since every row is needed there isn't any wasted work here.

-- get logical reads after query execution
SET STATISTICS IO ON;

select t5.Id
from table5 t5;

Table 'table5'. Scan count 1, logical reads 900, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now let's consider your first test query that doesn't use the hint. The outer table for the final hash match is estimated to have 9657 rows on my system. The query optimizer decided that an index scan on IdxTable5_FKTable4Id is a good enough plan. Here is the query that I ran:

select      t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 on t5.Table4Id = t4.Id;

Table 'table5'. Scan count 1, logical reads 900, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The outer table for the hash join actually had 10000 rows. However, because this is a hash join, SQL Server still needed to scan all 100000 rows from the index even though only 10000 were needed. That's why the same 900 logical reads are needed for this query as the first one. It could be argued that this is wasted effort by the query optimizer. Could it be more efficient to use an index seek to only fetch the 10000 rows that are needed from the index?

First let's get an estimate for the number of reads needed. Your index has a depth of 3:

-- get the index depth of the index
SELECT INDEXPROPERTY ( object_ID('table5'), 'IdxTable5_FKTable4Id' , 'IndexDepth');

Also I will enable TF 8744 to get cleaner results for the purpose of this demo:

-- Trace flag 8744: Disable pre-fetching for ranges
dbcc traceon(8744);

I know that there are 10000 rows from the outer table so one estimate for the number of logical reads is 10000 * (3 + 1) = 40000. Per row, that's 3 for the index depth and 1 to get the data.

Here is the query that was tested:

select      t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 WITH (FORCESEEK) on t5.Table4Id = t4.Id;

Table 'table5'. Scan count 10000, logical reads 41118, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

That's pretty close to the estimate of 40000.

What did we learn here? For this index there is a cost of about 4 reads per index seek and a fixed cost of 900 reads to do the scan. This means that from an IO perspective using index seeks will only be more efficient when getting a small percentage of the data from the index. Otherwise getting all of the data using an index scan, even if it isn't needed to return correct results for the query, will be more efficient.

For a final test, let's try just getting back the first 1000 rows from the original test query. On my system the query optimizer naturally picks an index seek even without the hint. Here is the query that I ran:

select TOP (1000)    t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 on t5.Table4Id = t4.Id;

Table 'table5'. Scan count 100, logical reads 409, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

For that query index seeks can be viewed as more efficient than an index scan. Note that the outer table has 100 rows so 100 seeks were done. One seek can return more than 1 row. That's why the logical reads are close to 400 as opposed to 4000.