Sql-server – Despite STATISTICS is updated, Estimated and Actual Row count is not same

cardinality-estimatessql serversql-server-2008-r2statistics

I am executing a SELECT statement to retrieve 2 columns with filter on datetime column from a table having 27451373 rows.

declare @LastProcessedDate datetime='08-15-2016'

select Bill_Code,CLIENT_ID_6,u2_id
from [ODS_HRP].dbo.EMPLOYEE_BILL_VCHR_CODE_ALLOCATIONS
where time_stamp >= @LastProcessedDate

In execution plan I found big gap between Estimated and Actual Number of rows.
Hence I update statistics 2 time in a week, I updated all statistics of the table again and executed same SELECT statement.

But, it is still same.

Here is the execution plan.

enter image description here

And last stats update date.

enter image description here

Thanks

Best Answer

As others have pointed out the most immediate problem is your use of the local variable. SQL Server creates the query plan for that query before it assigns a value to the local variable. You have an unknown input filtered using the >= operator. The legacy cardinality estimator in SQL Server has a rule that treats this as a special case. The cardinality estimate will always be 30% of the number of rows in the table with a non-null value for the filter column.

I don't know many NULL values you have but if you do the math it almost works out exactly:

27451373 * 0.3 = 8235411.9

There are a few ways around this problem. You can use an OPTION (RECOMPILE) hint. That will force SQL Server to never use a cached query plan and it will create the query plan after the variable has been assigned. Most of the time you will see the actual value for the variable in the plan. The unknown estimate of 30% will no longer be used. One exception that I know of is SQL Server may parameterize trivial queries, even those that use a RECOMPILE hint. In that case you may not see the variable value in the plan. I assume that this table is part of a larger query. If so you should not have that problem.

Other options are to hardcode your filter values into the query, use dynamic SQL, or to use a parameterized query. My preference is the RECOMPILE hint unless your Unless your query is executed so many times per day that you cannot afford to recompile the plan.

I also want to point out that it's relatively uncommon for the estimated number of rows to exactly match the actual number of rows. To get an exact match you typically need to work with a simple query and/or you need to get lucky with statistics.