According to MS docs, the description for AVG_RANGE_ROWS
is:
Average number of rows with duplicate column values within a histogram step, excluding the upper bound. When DISTINCT_RANGE_ROWS is greater than 0, AVG_RANGE_ROWS is calculated by dividing RANGE_ROWS by DISTINCT_RANGE_ROWS. When DISTINCT_RANGE_ROWS is 0, AVG_RANGE_ROWS returns 1 for the histogram step.
I'm looking at the last line and if that is indeed the case, I'm curious to know why I'm seeing a value for AVG_RANGE_ROWS
that is not equals 1
when DISTINCT_RANGE_ROWS
is 0
in the histogram steps.
The statistic in question is a column stat created by SQL Server when the auto create statistics option is turned on. I'm on an older version of the database, but on the latest patch – SQL Server 2014 SP3, CU4+GDR (12.0.6372.1).
It is a bit unfortunate that we almost had a meltdown last week because of a sub-optimal query plan. The end result result was large scans and bloated memory grants. Resampling the statistic with a higher percent value sort of resolved the issue for us for the time being but I'm mostly curious to know if there are exceptions around the initial statement or a known issue (perhaps addressed using a trace flag?) and how do I prevent this from happening again for auto created stats where we have no control over the sampling size?
Best Answer
As described in answer to Badly-formed histogram causes bad estimates on Nested Loop there has been a change to the way sampled statistics are calculated and stored, more particularly when scaling is applied.
As a side-effect, the value for
DISTINCT_RANGE_ROWS
in your case is a fraction between 0 and 1 (980.235 / 386212.6 = 0.002538071). The column has an exposed type ofbigint
, so it is rounded down to zero.Clearly there can't really be zero distinct values when the range contains a non-zero numbers of rows.
One can only hope these discrepancies are cleared up at some stage; though it's hard to imagine exactly what that might look like without a potentially breaking change of data type, extending also to
sys.dm_db_stats_histogram
(available on SQL Server 2016 and later).As for what you do about it, if you are convinced this is not just a display issue and is actually causing poor estimations, you should report it as a regression.