I was researching something else when I came across this thing. I was generating test tables with some data in it and running different queries to find out how different ways to write queries affects execution plan. Here is the script that I used to generate random test data:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID('t') AND type in (N'U'))
DROP TABLE t
GO
CREATE TABLE t
(
c1 int IDENTITY(1,1) NOT NULL
,c2 int NULL
)
GO
insert into t
select top 1000000 a from
(select t1.number*2048 + t2.number a, newid() b
from [master]..spt_values t1
cross join [master]..spt_values t2
where t1.[type] = 'P' and t2.[type] = 'P') a
order by b
GO
update t set c2 = null
where c2 < 2048 * 2048 / 10
GO
CREATE CLUSTERED INDEX pk ON [t] (c1)
GO
CREATE NONCLUSTERED INDEX i ON t (c2)
GO
Now, given this data, I invoked the following query:
select *
from t
where
c2 < 1048576
or c2 is null
;
To my great surprise, the execution plan that was generated for this query, was this. (Sorry for the external link, it's too large to fit here).
Can someone explain to me what's up with all these "Constant Scans" and "Compute Scalars"? What's happening?
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1010], [Expr1011], [Expr1012]))
|--Merge Interval
| |--Sort(TOP 2, ORDER BY:([Expr1013] DESC, [Expr1014] ASC, [Expr1010] ASC, [Expr1015] DESC))
| |--Compute Scalar(DEFINE:([Expr1013]=((4)&[Expr1012]) = (4) AND NULL = [Expr1010], [Expr1014]=(4)&[Expr1012], [Expr1015]=(16)&[Expr1012]))
| |--Concatenation
| |--Compute Scalar(DEFINE:([Expr1005]=NULL, [Expr1006]=NULL, [Expr1004]=(60)))
| | |--Constant Scan
| |--Compute Scalar(DEFINE:([Expr1008]=NULL, [Expr1009]=(1048576), [Expr1007]=(10)))
| |--Constant Scan
|--Index Seek(OBJECT:([t].[i]), SEEK:([t].[c2] > [Expr1010] AND [t].[c2] < [Expr1011]) ORDERED FORWARD)
Best Answer
The constant scans each produce a single in-memory row with no columns. The top compute scalar outputs a single row with 3 columns
The bottom compute scalar outputs a single row with 3 columns
The concatenation operator Unions these 2 rows together and outputs the 3 columns but they are now renamed
The
Expr1012
column is a set of flags used internally to define certain seek properties for the Storage Engine.The next compute scalar along outputs 2 rows
The last three columns are defined as follows and are just used for sorting purposes prior to presenting to the Merge Interval Operator
Expr1014
andExpr1015
just test whether certain bits are on in the flag.Expr1013
appears to return a boolean column true if both the bit for4
is on andExpr1010
isNULL
.From trying other comparison operators in the query I get these results
From which I infer that Bit 4 means "Has start of range" (as opposed to being unbounded) and Bit 16 means the start of the range is inclusive.
This 6 column result set is emitted from the
SORT
operator sorted byExpr1013 DESC, Expr1014 ASC, Expr1010 ASC, Expr1015 DESC
. AssumingTrue
is represented by1
andFalse
by0
the previously represented resultset is already in that order.Based on my previous assumptions the net effect of this sort is to present the ranges to the merge interval in the following order
The merge interval operator outputs 2 rows
For each row emitted a range seek is performed
So it would appear as though two seeks are performed. One apparently
> NULL AND < NULL
and one> NULL AND < 1048576
. However the flags that get passed in appear to modify this toIS NULL
and< 1048576
respectively. Hopefully @sqlkiwi can clarify this and correct any inaccuracies!If you change the query slightly to
Then the plan looks much simpler with an index seek with multiple seek predicates.
The plan shows
Seek Keys
The explanation for why this simpler plan cannot be used for the case in the OP is given by SQLKiwi in the comments to the earlier linked blog post.
An index seek with multiple predicates cannot mix different types of comparison predicate (ie.
Is
andEq
in the case in the OP). This is just a current limitation of the product (and is presumably the reason why the equality test in the last queryc2 = 0
is implemented using>=
and<=
rather than just the straightforward equality seek you get for the queryc2 = 0 OR c2 = 1048576
.