Schema and indexes are only one aspect of query plan and performance. Your statement "but with different data" is likely the source of the difference. The number of rows and the distribution of data is essential to the query optimizer. If you have significantly more rows in D2, or if the data is of entirely different characteristics (wider or narrower range of values), then you should expect to see different performance and execution plans.
For each set of statistics, SQL Server keeps a maximum of 200 samples. As the rows in the tables grow and the more irregular the distribution of values the more likely it is that SQL Server will not have enough information to generate optimal execution plans. That's where the use of filtered indexes and statistics comes into play.
If this is a parameterized query you may also be running into a parameter sniffing problem. Note that if you're using local variables the calculation changes also.
The optimizer is convinced that if it's going to have to go back to the disk for retrieving column data anyway, it might as well scan the table in the first place, since that'll be less work for it to do. It'll use the seek with the CHAR( 7 )
scalar because the statistics for the index know it's not going to find anything, but if data needs to be returned, it has to consider both CPU and I/O weights.
USE tempdb;
GO
IF NOT EXISTS ( SELECT 1
FROM sys.objects
WHERE name = 'tst'
AND type = 'U' )
BEGIN
--DROP TABLE dbo.tst;
CREATE TABLE [dbo].[tst]
(
Mon [char](6) NULL,
COL1 [varchar](50) NULL,
COL2 [varchar](50) NULL,
COL3 [varchar](50) NULL,
COL4 [varchar](50) NULL,
COL5 [varchar](50) NULL
);
INSERT INTO dbo.tst ( [Mon] )
SELECT TOP 100000000
CONVERT( CHAR( 6 ), DATEADD( DAY, ( ABS( CHECKSUM( NEWID() ) ) % 10000 + 1 ),
'20000101' ), 112 )
FROM sys.all_objects so
CROSS APPLY sys.all_objects sp;
CREATE NONCLUSTERED INDEX IX__tst__Mon
ON dbo.tst ( Mon )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );
END;
SELECT Mon, COUNT( 1 )
FROM dbo.tst
GROUP BY Mon
ORDER BY Mon;
SET STATISTICS IO, TIME ON;
SELECT Mon, COL1
FROM dbo.tst
WHERE Mon = '201509'
SELECT Mon, COL1
FROM dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE Mon = '201509'
SELECT *
FROM dbo.tst
WHERE Mon = '201509'
SELECT *
FROM dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE Mon = '201509'
SET STATISTICS IO, TIME OFF;
Specifying the hint, in both cases, does reduce the time required for the query to resolve, but the index seek + RID lookup actually results in a significant increase in the number of reads necessary ( my test indicated a 60% increase ). Obviously it's not a 1:1 trade off, since the time difference is about 6x, but regardless, the optimizer is choosing the scan instead.
If you can INCLUDE
the columns you need in the index, you'll get the best of both worlds, eliminating the RID lookup and the additional reads.
--DROP INDEX dbo.tst.IX__tst__Mon
CREATE NONCLUSTERED INDEX IX__tst__Mon
ON dbo.tst ( Mon )
INCLUDE ( COL1, COL2, COL3, COL4, COL5 )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );
SET STATISTICS IO, TIME ON;
SELECT *
FROM dbo.tst
WHERE Mon = '201509'
SET STATISTICS IO, TIME OFF;
Best Answer
The initial seek down the b-tree is to find the first row where
CustomerID >= 1
.From that point on, the storage engine remembers the current scan position, and returns the next row in index order that qualifies each time a row is requested by a parent plan operator. The scan comes to an end as soon as a row is encountered that does not match the predicate
CustomerID <= 70000
.The effect is that the 'seek' is an initial seek, followed by a partial ordered scan of the index.
This is usually more efficient than scanning the whole index, even where 90% of the rows are expected to qualify. The key point to get straight in your mind is that a separate b-tree seek is not performed for each row.