Sql-server – why it is doing index seek

execution-plansql serverstatistics

I created a sample table as below

CREATE TABLE [dbo].[StatisticsDemo](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NULL
   ) ON [PRIMARY]

Then I inserted below data as below:

SELECT NAME,COUNT(*) AS COUNT 
FROM StatisticsDemo 
GROUP BY NAME

NAME    COUNT
-------------
AABBCC  59999
XXYYZZ  1

Then I created below Non Clustered Index :

CREATE NONCLUSTERED INDEX [NCI_STATISTICSDEMO_NAME] ON [dbo].[StatisticsDemo]
(
    [Name] ASC
)

Now I ran the below query:

SELECT NAME FROM [dbo].[StatisticsDemo] 
WHERE NAME = 'AABBCC'

As expected it is returning 59999 rows but it is doing Index Seek on Non Clustered Index. But as per my knowledge, it should do Index Scan as 99.99% of data satisfied the filter criteria mentioned in the select query.

Can some please tell me why it is doing Index Seek instead of Index Scan?

The purpose of the entire activity was to prove(as I am about to give presentation on statistics) that SQL Server looks into statistics to identify the number of records which matches the filter criteria of query before preparing the execution plan and based on the % of records matches out of total records in table, it will either decide to do a SCAN or SEEK. If % of records matches is approximately equal to total number of records in table, it should do SCAN. But that is not happening. Same is the case when I am using AdventureWorks2016 Database and running below query:

select * from [Sales].[SalesOrderHeader] WHERE SalesOrderID >= 43659 AND 
SalesOrderID <= 73659

The above query returning 30001 records out of 31465. But it is still doing Clustered Index Seek.

I am getting terribly confused and it is shaking my concepts. 🙁 Can some please help.

PS: I cleaned Plan cache as well but no luck. SQL Server version is 2016.

Best Answer

The purpose of the entire activity was to prove(as I am about to give presentation on statistics) that SQL Server looks into statistics to identify the number of records which matches the filter criteria of query before preparing the execution plan and based on the % of records matches out of total records in table, it will either decide to do a SCAN or SEEK. If % of records matches is approximately equal to total number of records in table, it should do SCAN.

this is incorrect so explains why you aren't seeing it. The BETWEEN 43659 AND 73659 range seek is doing a partial scan. It is just able to use the B-tree to seek into the point where to begin the scan (so avoid reading anything lower than 43659 ) and potentially exit early if there are rows with values greater than 73659.

For the rows that are in the range it just reads the pages and follows the linked list to the next leaf page in exactly the same way as an index ordered scan does.

There is no reason to want a scan here. At best it saves a handful of logical reads for navigating from the root to the leaf to find the start point but at the expense of reading additional rows outside the range seeked.

Related Solutions

Sql-server – SHOWPLAN does not display a warning but “Include Execution Plan” does for the same query

This:

SET SHOWPLAN_XML ON;
GO
SELECT * FROM sys.objects;
GO

Is equivalent to pressing Display Estimated Execution Plan on the toolbar (or hitting Ctrl + L). You'll notice that no rows are returned from the query, like there is when you use Include Actual Execution Plan (Ctrl + M).

The spill warning is only a runtime warning. There is no way that SQL Server can know, when displaying the estimated plan, that a spill will happen at runtime. This is because a spill is caused by factors that might only be present during certain invocations of the query (for example, when there is memory pressure). The estimated plan knows roughly how much memory it's going to ask for, but it can't know until execution that it isn't going to get it.

As an aside, may I recommend* our free tool, SQL Sentry Plan Explorer? I think it provides much more obvious information than Management Studio. I recently wrote a lengthy blog post that can act as a tutorial, and Jonathan Kehayias has a great PluralSight course on it as well.

_{* Disclaimer: I work for SQL Sentry.}

Sql-server – Why is the optimizer doing a table scan vs Index Seek

The optimizer is convinced that if it's going to have to go back to the disk for retrieving column data anyway, it might as well scan the table in the first place, since that'll be less work for it to do. It'll use the seek with the CHAR( 7 ) scalar because the statistics for the index know it's not going to find anything, but if data needs to be returned, it has to consider both CPU and I/O weights.

USE tempdb;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'tst'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.tst;
    CREATE TABLE [dbo].[tst] 
    (
        Mon                     [char](6) NULL,
        COL1                    [varchar](50) NULL,
        COL2                    [varchar](50) NULL,
        COL3                    [varchar](50) NULL,
        COL4                    [varchar](50) NULL,
        COL5                    [varchar](50) NULL
    );

    INSERT INTO dbo.tst ( [Mon] )
    SELECT  TOP 100000000
            CONVERT( CHAR( 6 ), DATEADD( DAY, ( ABS( CHECKSUM( NEWID() ) ) % 10000 + 1 ),
                '20000101' ), 112 )         
    FROM    sys.all_objects so
    CROSS APPLY sys.all_objects sp;

    CREATE NONCLUSTERED INDEX IX__tst__Mon
        ON dbo.tst ( Mon )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );
END;

SELECT  Mon, COUNT( 1 )
FROM    dbo.tst
GROUP BY Mon
ORDER BY Mon;

SET STATISTICS IO, TIME ON;

SELECT  Mon, COL1
FROM    dbo.tst
WHERE   Mon = '201509'

SELECT  Mon, COL1
FROM    dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE   Mon = '201509'

SELECT  *
FROM    dbo.tst
WHERE   Mon = '201509'

SELECT  *
FROM    dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE   Mon = '201509'

SET STATISTICS IO, TIME OFF;

Specifying the hint, in both cases, does reduce the time required for the query to resolve, but the index seek + RID lookup actually results in a significant increase in the number of reads necessary ( my test indicated a 60% increase ). Obviously it's not a 1:1 trade off, since the time difference is about 6x, but regardless, the optimizer is choosing the scan instead.

If you can INCLUDE the columns you need in the index, you'll get the best of both worlds, eliminating the RID lookup and the additional reads.

--DROP INDEX dbo.tst.IX__tst__Mon
CREATE NONCLUSTERED INDEX IX__tst__Mon
    ON dbo.tst ( Mon )
INCLUDE ( COL1, COL2, COL3, COL4, COL5 )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );

SET STATISTICS IO, TIME ON;

SELECT  *
FROM    dbo.tst
WHERE   Mon = '201509'

SET STATISTICS IO, TIME OFF;

Best Answer

Related Solutions

Sql-server – SHOWPLAN does not display a warning but “Include Execution Plan” does for the same query

Sql-server – Why is the optimizer doing a table scan vs Index Seek

Related Question