Sql-server – Clarification on Clustered index scan and Clustered Index Seek

execution-planindexsql serversql server 2014

I have created one table with 80000 records with Cluster Index on CustomerID column. When I queried the following statements the execution plan behaves in differently. Can anyone clarify the same.

select * from customer

Showing Cluster Index Scan, So this is correct.

select * from customer where CustomerID=80000

Execution plan shows ClusterIndex Seek, So this is correct.

select * from customer where CustomerID between 1 and 70000

Execution plan still shows ClusterIndex Seek why?

As per the statistics the row count is 90% then the optimizer have to use 'ClusterIndex Scan' but it's using 'Cluster Index Seek' Why? All statistics are up to date and I have checked the Estimated row and actual rows are same then why the optimizer choose 'Cluster Index Seek' For getting 90% of rows among total records.

Best Answer

Execution plan still shows ClusterIndex Seek why?

The initial seek down the b-tree is to find the first row where CustomerID >= 1.

From that point on, the storage engine remembers the current scan position, and returns the next row in index order that qualifies each time a row is requested by a parent plan operator. The scan comes to an end as soon as a row is encountered that does not match the predicate CustomerID <= 70000.

The effect is that the 'seek' is an initial seek, followed by a partial ordered scan of the index.

This is usually more efficient than scanning the whole index, even where 90% of the rows are expected to qualify. The key point to get straight in your mind is that a separate b-tree seek is not performed for each row.

Related Solutions

Sql-server – SQL Server Index Scan Actual Executions

Schema and indexes are only one aspect of query plan and performance. Your statement "but with different data" is likely the source of the difference. The number of rows and the distribution of data is essential to the query optimizer. If you have significantly more rows in D2, or if the data is of entirely different characteristics (wider or narrower range of values), then you should expect to see different performance and execution plans.

For each set of statistics, SQL Server keeps a maximum of 200 samples. As the rows in the tables grow and the more irregular the distribution of values the more likely it is that SQL Server will not have enough information to generate optimal execution plans. That's where the use of filtered indexes and statistics comes into play.

If this is a parameterized query you may also be running into a parameter sniffing problem. Note that if you're using local variables the calculation changes also.

Sql-server – Why is the optimizer doing a table scan vs Index Seek

The optimizer is convinced that if it's going to have to go back to the disk for retrieving column data anyway, it might as well scan the table in the first place, since that'll be less work for it to do. It'll use the seek with the CHAR( 7 ) scalar because the statistics for the index know it's not going to find anything, but if data needs to be returned, it has to consider both CPU and I/O weights.

USE tempdb;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'tst'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.tst;
    CREATE TABLE [dbo].[tst] 
    (
        Mon                     [char](6) NULL,
        COL1                    [varchar](50) NULL,
        COL2                    [varchar](50) NULL,
        COL3                    [varchar](50) NULL,
        COL4                    [varchar](50) NULL,
        COL5                    [varchar](50) NULL
    );

    INSERT INTO dbo.tst ( [Mon] )
    SELECT  TOP 100000000
            CONVERT( CHAR( 6 ), DATEADD( DAY, ( ABS( CHECKSUM( NEWID() ) ) % 10000 + 1 ),
                '20000101' ), 112 )         
    FROM    sys.all_objects so
    CROSS APPLY sys.all_objects sp;

    CREATE NONCLUSTERED INDEX IX__tst__Mon
        ON dbo.tst ( Mon )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );
END;

SELECT  Mon, COUNT( 1 )
FROM    dbo.tst
GROUP BY Mon
ORDER BY Mon;

SET STATISTICS IO, TIME ON;

SELECT  Mon, COL1
FROM    dbo.tst
WHERE   Mon = '201509'

SELECT  Mon, COL1
FROM    dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE   Mon = '201509'

SELECT  *
FROM    dbo.tst
WHERE   Mon = '201509'

SELECT  *
FROM    dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE   Mon = '201509'

SET STATISTICS IO, TIME OFF;

Specifying the hint, in both cases, does reduce the time required for the query to resolve, but the index seek + RID lookup actually results in a significant increase in the number of reads necessary ( my test indicated a 60% increase ). Obviously it's not a 1:1 trade off, since the time difference is about 6x, but regardless, the optimizer is choosing the scan instead.

If you can INCLUDE the columns you need in the index, you'll get the best of both worlds, eliminating the RID lookup and the additional reads.

--DROP INDEX dbo.tst.IX__tst__Mon
CREATE NONCLUSTERED INDEX IX__tst__Mon
    ON dbo.tst ( Mon )
INCLUDE ( COL1, COL2, COL3, COL4, COL5 )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );

SET STATISTICS IO, TIME ON;

SELECT  *
FROM    dbo.tst
WHERE   Mon = '201509'

SET STATISTICS IO, TIME OFF;

Best Answer

Related Solutions

Sql-server – SQL Server Index Scan Actual Executions

Sql-server – Why is the optimizer doing a table scan vs Index Seek

Related Question