SARG Cardinality Estimate vs Full-Scan in SQL Server

performancequery-performancesql serversql-server-2008-r2sql-server-2012

Why there is no full-scan (On SQL 2008 R2 and 2012)?

Test data:

DROP TABLE dbo.TestTable
GO  
CREATE TABLE dbo.TestTable
(
   TestTableID INT IDENTITY PRIMARY KEY,
   VeryRandomText VarChar(50),
   VeryRandomText2 VarChar(50)
)
Go
Set NoCount ON
Declare @i int
Set @i = 0
While @i < 10000
Begin
   Insert Into dbo.TestTable(VeryRandomText, VeryRandomText2)
      Values(Cast(Rand()*10000000 as VarChar(50)), Cast(Rand()*10000000 as VarChar(50)));
   Set @i = @i + 1;
End
Go
CREATE Index IX_VeryRandomText On dbo.TestTable
(
    VeryRandomText
)
Go

When execute query:

Select * From dbo.TestTable Where VeryRandomText = N'111' -- bad

Get warning (as expected, because comparing nchar data to varchar column):

<PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT_IMPLICIT(nvarchar(50),[DemoDatabase].[dbo].[TestTable].[VeryRandomText],0)" />

But then i see execution plan, and i can see, that it is not using full-scan as i would expect, but index seek instead.

enter image description here

Of course, this is kind of good, because in this particular case execution is way faster than if there would be full scan.

But i can not understand how SQL server came to decision to make this plan.

Also- if the server collation would be Windows collations on server level and SQL Server collation database level, then it would cause full scan on the same query.

Best Answer

When comparing values of different datatypes SQL Server follow the Data Type Precedence rules. Since nvarchar has higher precedence than varchar SQL Server has to convert the column data to nvarchar before comparing values. That means applying a function on the column and that would make the query non-sargable.

SQL Server does however do it's best to protect you from your mistakes so it uses a technique described by Paul White in the blog post Dynamic Seeks and Hidden Implicit Conversions to do a seek for a range of values and then doing the final comparison, with the conversion of the column value to nvarchar, in a residual predicate to filter out any false positives.

As you have noted, this does however not work when the collation of the column is a SQL collation. The reason for that, I believe, can be found in the article Comparing SQL collations to Windows collations

Basically, a Windows collation uses the same algorithm for varchar and nvarchar where a SQL collation uses a different algorithm for varchar data and the same algorithm as a Windows collation for nvarchar data.

So going from varchar to nvarchar under a Windows collation will use the same algorithm and SQL Server can produce a range of values from, in your case, a nvarchar literal to get rows from the varchar SQL collation column index. However, when the collation of the varchar column is a SQL Collation that is not possible because of the different algorithm used.


Update:

A demonstration of the different sort orders for varchar columns using windows and sql collation.

SQL Fiddle

MS SQL Server 2014 Schema Setup:

create table T(C varchar(10));

insert into T values('a-b'),('aa'),('ac');

Query 1:

select C
from T
order by C collate SQL_Latin1_General_CP1_CI_AS;

Results:

|   C |
|-----|
| a-b |
|  aa |
|  ac |

Query 2:

select C
from T
order by C collate Latin1_General_100_CI_AS;

Results:

|   C |
|-----|
|  aa |
| a-b |
|  ac |