I am looking at the query plan produced by SQL Server 2008 R2 and I'm not sure I'm interpreting it properly. Examples:
<RelOp NodeId="2" PhysicalOp="Sort" LogicalOp="Sort" EstimateRows="1" EstimateIO="0.0112613"
EstimateCPU="0.000100822" AvgRowSize="822" EstimatedTotalSubtreeCost="1.06489" Parallel="0"
EstimateRebinds="0" EstimateRewinds="0">
...
<MemoryFractions Input="0.5" Output="1" />
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10190123" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
and
<RelOp NodeId="6" PhysicalOp="Index Seek" LogicalOp="Index Seek" EstimateRows="1" EstimateIO="1"
EstimateCPU="0.0502411" AvgRowSize="24" EstimatedTotalSubtreeCost="1.05024" TableCardinality="4.93715e+009"
Parallel="0" Partitioned="1" EstimateRebinds="0" EstimateRewinds="0">
...
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10190123" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
Specifically, seeing EstimateRows="1"
and ActualRows="10190123"
I suspect that the optimizer incorrectly estimates the number of rows that will be processed. Subsequently, underestimated row set for the sort might lead to a major sort spill.
Edit:
The query in question looks similar to
select whatever from mytable
where integer_column1 > 848484884 and
integer_column1 < 949494949 and
another_int_col in (5, 8, 15, 20)
where the cardinality of integer_column1
is in millions and that of another_int_col
is a few dozen (a flag of sorts). There is a non-unique, non-clustered index on (integer_column1, another_int_col)
. I wonder if even in the presence of updated statistics the optimizer can use the histogram/density vector of that index though.
Does my reasoning make sense?
(I don't have access to the actual system or the Management Console for that matter — only the XML plan)
Best Answer
To answer my own question, my original understanding was correct: the discrepancy between
EstimateRows="1"
andActualRows="10190123"
was an indication of incorrect optimizer estimates, caused by stale statistics (as suggested by commenters). With updated statistics the node in the plan corresponding to node 6 in the question looks like so:Although the problem could be seen in the XML plan representation directly, for those who want a graphical representation there exist a free tool that does not require the SQL Server Management Studio, SQL Sentry Plan Explorer, or even Windows for that matter -- it is an XSL stylesheet created by the StackOverflow user Justin.