Sql-server – Histogram Skew on Clustered Index

cardinality-estimatessql serverstatistics

Problem

I've been attempting to optimize a query generated from Entity Framework, and noticed that the execution plan shows wildly inaccurate estimations. After a little digging, I noticed that the statistics objects for some of the clustered indexes is very skewed. Below is a snip of the top operators for the query, ordered by actual rows:

snip of actual versus estimated rows in query
(source: imgh.us)

EDIT: Redacted the database name.

The third operator from the top (the one I have selected) is a clustered index seek, and when I look at the statistics object for that particular index I see an absurd amount of skew:

terribly skewed clustered index statistics
(source: imgh.us)

Is there a reason for this? I'm confused as to why SQL Server is only generating 3 steps for this histogram with a full scan. With 225,000+ rows in the table, I would expect to see a more steps with smaller ranges, but what I'm actually seeing is nearly every row included in one step. Rebuilding indexes and updating stats does not redistribute this.

In addition to that particular index, it appears many other clustered index statistics have a similar problem, and provide similarly bad estimations. Another operator in the above list is a clustered index scan on the WorkOrder table, and it also produces a relatively bad estimation. Though it appears this is the cardinality estimator picking 30% of the rows, so perhaps it is an issue with parameter sniffing (or something other than statistics).

Basically, my questions are as follows:

  • What should clustered index statistics look like?
  • Is the above distribution expected?
  • If it is expected, how do I work around these bad estimates?
  • If it is not expected, how do I correct these bad statistics?

EDIT2: Here is the XML Plan: https://www.brentozar.com/pastetheplan/?id=rJTjT4zze

Best Answer

Your statistics are fine.

It is telling you that you have

  • exactly one row with value 1
  • 1 row with value 213152
  • 1 row with value 213151
  • 213,144 distinct values that are > 1 and < 213,151 (so basically one row for each of the 213,149 possible integer values between that with 5 missing)