Sql-server – Double range query on one column uses wrong seek predicate

sql server

This problem comes up in the real world, but is already present in the following trivial example. I am using SQL Server 2017.

Given a table with only one non-id column, which takes only one value:

create table #test (id bigint not null, status smallint not null)

/* Populate with 10000 times value 10 */
;WITH numbers(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM numbers where Number<10000
)

insert into #test (id,status)
select number,10 from numbers option(maxrecursion 10000)

/* Create fresh index */
create index index_status on #test ([status])
DBCC SHOW_STATISTICS ("tempdb..#test", 'index_status') WITH HISTOGRAM   

Now we query a double range on this column (0 results, since the only value is outside both ranges):

select 1
from #test
where status < 10
and status <> 2

The chosen execution plan uses an Index Seek, with

  • Seek Predicate "status < 2 AND status > 2"
  • Predicate "status < 10"

and consequently needs to read all 10000 rows. This surprises me, since I can read off the index statistics to conclude that a Seek Predicate on "status < 10" would be more efficient, since this already filters out all rows.

Question: Why are Seek Predicate and Predicate chosen 'the wrong way'?

I am not interested in alternatives or workarounds, I am only interested in understanding this choice of execution plan. For example we can rewrite the query using a UNION ALL on both range queries seperately, forcing two efficient Seek Predicates:

select 1
from #test
where status < 2

union all 

select 1
from #test
where status > 2 and status < 10

Best Answer

Running

dbcc traceon (3604,8606)

And then the query with a #temp table the final tree before cost based optimisation looks as follows (with the NE/ <> predicate first and the LT/ < predicate second) - the deviation of the predicate order from the "Input tree" happens from the "Simplified Tree" stage and stays that way in subsequent trees.

*** Tree After Project Normalization ***
        LogOp_Project

            LogOp_Select

                LogOp_Get TBL: #test #test TableID=-1583795211 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

                ScaOp_Logical x_lopAnd

                    ScaOp_Comp x_cmpNe

                        ScaOp_Identifier QCOL: [#test].status

                        ScaOp_Const TI(smallint,ML=2) XVAR(smallint,Not Owned,Value=2)

                    ScaOp_Comp x_cmpLt

                        ScaOp_Identifier QCOL: [#test].status

                        ScaOp_Const TI(smallint,ML=2) XVAR(smallint,Not Owned,Value=10)

            AncOp_PrjList 

                AncOp_PrjEl COL: Expr1003 

                    ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)

With a permanent table the tree looks as follows.

*** Tree After Project Normalization ***
        LogOp_Project

            LogOp_Select

                LogOp_Get TBL: test test TableID=645577338 TableReferenceID=0 IsRow: COL: IsBaseRow1001 

                ScaOp_Logical x_lopAnd

                    ScaOp_Comp x_cmpLt

                        ScaOp_Identifier QCOL: [tempdb].[dbo].[test].status

                        ScaOp_Identifier COL: ConstExpr1004 

                    ScaOp_Comp x_cmpNe

                        ScaOp_Identifier QCOL: [tempdb].[dbo].[test].status

                        ScaOp_Identifier COL: ConstExpr1005 

            AncOp_PrjList 

                AncOp_PrjEl COL: Expr1003 

                    ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)

The second case is quite fragile. Adding a query hint - such as OPTION (RECOMPILE) it reverts to the first case. It seems to retain the initial order of the predicates, as written in the query, in the cases that simple parametrisation is attempted (this is not attempted for the temp table #test).

enter image description here

select 1 from test where status <> 2 and status < 10 gives the "bad" plan again with the seek on <> 2 and residual on < 10

enter image description here

My supposition is that the index matching code just picks the first seekable predicate available in input tree order in the rare case that there are multiple competing seek predicates for the same column.

Specifically the SelPredNorm transformation rule appears to cause the predicate re-ordering. The following gives the desired plan.

select 1
from #test
where status < 10
and status <> 2
option (queryruleoff SelPredNorm);

As for why this does not kick in (in this case) with auto parameterised queries that question is still not addressed.