Sql-server – Why/when does the SQL Server query optimizer use EQ_ROWS

sql serverstatistics

I'm trying to understand how exactly the statistics work.

I've got a clustered index on my date. I can see a convert_implicit is executed when I run the following query.

Don't know why it's doing that.

select date
from sales
where date = '20141204'

Statistics say:

Rows: 23818426
Rows Sampled : 216564
All density : 0,0008880995

So the density vector (that this query is not using) : 21153,13

Now for '20141204' my histogram is showing the following

RANGE_HI_KEY    RANGE_ROWS  EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
2014-10-30  86723,38    38235,36    3   28770,91
2014-12-04  110856,1    36372,89    16   6902,33
2014-12-18   73389,8    68144,39    6   12180,92

I have my problem now understanding, why the optimzer chooses EQ_ROWS with the value of 36372,89 for my estimated rows. The actual rows are 9472.

I've freshly updated my statistics.

So now I'm wondering, if my histogram is just wrong because the I've not scanned all the rows with my statistics, but just 216564.

Sure enough when I query for 2014-12-05, I'll get the expected 12180,9 for estimated rows.

Why is the optimizer using EQ_ROWS for 2014-12-04?

And as for the convert_implicit – why is it converting? I thought the syntax 20141204 is a standard notation. Can we avoid the conversion in any way?

Best Answer

My best guess is, the optimizer uses EQ_ROWS every time the predicate equals (hence the EQ) the value in the sampled histogram.