Sql-server – Estimated vs. Actual rows and multi-column statistics

sql serverstatistics

I'm trying to understand multi column indexes and statistics, here is my real world example:

Table: TaskExecutions, size=~1 million rows.
Query SARGS: TaskExecStatusID = 3 AND TaskExecUpdatedDate IS NULL

Distribution:: TaskExecStatusID: 1=160k rows, 2=370k rows, 3=560k rows.  
TaskExecUpdatedDate: Null=171k rows.

I have an index on (TaskExecStatusID,TaskExecUpdatedDate),

My problem/question is that the estimated number of rows for the above query is 256, the actual number of rows is 560K.

I want to understand why there is such a big difference between these two numbers?

Stats are up to date, the stats for the index are multi-column stats. I've tried creating in index/stats on (TaskExecUpdatedDate, TaskExecStatusID), and the estimate/actual does not change.

If I run a DBCC SHOW_Statistics, the density section has both columns in it, the histogram does not. Does SQL server produce a histogram for the combination of columns in multi-column statistics?

dave

Best Answer

You've actually got a few questions in here, so I'll break 'em out individually.

My problem/question is that the estimated number of rows for the above query is 256, the actual number of rows is 560K. I want to understand why there is such a big difference between these two numbers?

In order to answer that question, the first thing we would need is the actual execution plan for the query. In SQL Server Management Studio, you can get that by clicking Query, Include Actual Execution Plan. Run the query, click on the Execution Plan tab, and right-click anywhere in the whitespace to click Save Plan As. Save that, and post it somewhere for people to download and examine.

The next thing we would need is the output from DBCC SHOW_STATISTICS for the stats on that table. You've hinted at the output, and that's a good start, but the raw output will help us understand exactly what's going on.

If I run a DBCC SHOW_Statistics, the density section has both columns in it, the histogram does not. Does SQL server produce a histogram for the combination of columns in multi-column statistics?

No.

I have an index on (TaskExecStatusID,TaskExecUpdatedDate),

If you frequently use the query in the example (with TaskExecUpdateDate IS NULL), then you might check out filtered indexes. They're a new feature in SQL Server 2008 that allows you to put a where clause on your index, basically.

http://sqlfool.com/2009/04/filtered-indexes-what-you-need-to-know/