Sql-server – incorrect Estimated Number of Rows vs Actual number of rows

optimizationperformancequery-performancesql serverstatistics

I'm seeing a huge difference in the estimated number of rows vs actual number of rows. I'm not sure if i'm reading this right, so does this mean that sql server estimated the number of rows returned from this index is 16, but the actual number of rows being returned is 565824? if so, then is this a concern? This is for SQL server 2016

Best Answer

If you include cd_key in your index, it won’t have to do those lookups and the estimates will have less of an impact. Otherwise, correcting that estimate is going to cause it to do a Clustered Index Scan, which could also be slow. cd_camp is already in there as it’s part of the clustered index key.

But really, the index you want here is:

CREATE CLUSTERED INDEX ixYearCamp ON campdiv (cd_campyr, cd_camp) include (cd_key);

Related Solutions

Sql-server – Estimated vs. Actual rows and multi-column statistics

You've actually got a few questions in here, so I'll break 'em out individually.

My problem/question is that the estimated number of rows for the above query is 256, the actual number of rows is 560K. I want to understand why there is such a big difference between these two numbers?

In order to answer that question, the first thing we would need is the actual execution plan for the query. In SQL Server Management Studio, you can get that by clicking Query, Include Actual Execution Plan. Run the query, click on the Execution Plan tab, and right-click anywhere in the whitespace to click Save Plan As. Save that, and post it somewhere for people to download and examine.

The next thing we would need is the output from DBCC SHOW_STATISTICS for the stats on that table. You've hinted at the output, and that's a good start, but the raw output will help us understand exactly what's going on.

If I run a DBCC SHOW_Statistics, the density section has both columns in it, the histogram does not. Does SQL server produce a histogram for the combination of columns in multi-column statistics?

No.

I have an index on (TaskExecStatusID,TaskExecUpdatedDate),

If you frequently use the query in the example (with TaskExecUpdateDate IS NULL), then you might check out filtered indexes. They're a new feature in SQL Server 2008 that allows you to put a where clause on your index, basically.

http://sqlfool.com/2009/04/filtered-indexes-what-you-need-to-know/

Sql-server – Estimated vs Actual rows differences (actual much smaller than estimated) – sort

There is no statistics generated on XML columns. The estimates is guessed based on the expressions used when querying the XML.

Using this table:

create table T(XMLCol xml not null)
insert into T values('<root><item value = "1" /></root>')

And this rather simple XML query:

select X.N.value('@value', 'int')
from T
  cross apply T.XMLCol.nodes('root/item') as X(N)

Will give you one row returned but the estimated rows returned is 200. It will be 200 regardless of what XML or how much XML you stuff into the XML column for that one row.

This is the query plan with the estimated row count displayed.

enter image description here

A way to improve, or at least change, the estimates is to give the query optimizer some more information about the XML. In this case, because I know that root really is a root node in the XML, I can rewrite the query like this.

select X2.N.value('@value', 'int')
from T
  cross apply T.XMLCol.nodes('root[1]') as X1(N)
  cross apply X1.N.nodes('item') X2(N)

That will give me an estimate of 5 rows returned.

enter image description here

The rewrite of the query will probably not speed up the shredding of the XML but if the estimates are better, chances are that the query optimizer can make smarter decisions for the rest of the query.

I have not found any documentation on what the rules are for the estimates other than a presentation by Michael Rys where he says:

Base cardinality estimate is always 10’000 rows!
Some adjustment based on pushed path filters

Best Answer

Related Solutions

Sql-server – Estimated vs. Actual rows and multi-column statistics

Sql-server – Estimated vs Actual rows differences (actual much smaller than estimated) – sort

Related Question