Sql-server – Why can the index seek estimate the right number of rows and the sort operator can’t

indexsql serverstatistics

I've a query that uses a function on the predicate, something like this:

commentType = 'EL'
AND commentDateTime >= DATEADD(month,datediff(month,0,getdate()) - 13,0)

I have a filtered index on commentType that has 40K rows and when I run the query, the estimated number of rows for the Index Seek is very accurate (around 11K), but for the next step (sort operator) it completely ignores the statistics and just estimates the total number of rows in the filtered index.

Why is this happening? I know the basics about sargability, and I've tested just for the sake of sanity replacing the dateadd by an actual date (2014-01-01) and voila… The sort started guessing the number of rows correctly…

Why is this happening and how can I fix it? I can't pass a fixed date…

Best Answer

I believe your estimates are wrong because of an estimator bug that swaps two of the DATEDIFF arguments. I talk about this here:

A workaround is to calculate the first day of 13 months ago without using DATEDIFF (2008+):

DATEADD(MONTH, -13, DATEADD(DAY, 1-DATEPART(DAY,GETDATE()), CONVERT(DATE, GETDATE()));

I'm not positive that will address the estimate (I haven't tested with filtered indexes, and I'm not sure what the sort is actually doing or why it has a different estimate without the plan and/or the rest of the query).

The fix Microsoft recommends is to use TF 4199, but I'm not so sure that's what you'll need to do here:

Another option would be to make sure you're on the absolute latest SP/CU for whatever version of SQL Server you're using, as they claim it is fixed in the following KB article (though this will still require the use of TF 4199 unless you are on 2014 or better):

The fix can be obtained with the following builds:

(Next time, please include the results of SELECT @@VERSION in your question.)

I will note that the KB article says that DATEDIFF can underestimate the number of rows, which is the opposite of what is happening in your scenario. That doesn't mean the fixes don't apply to you; I think the KB article wording is inaccurate, as the estimates can go either way depending on the data and the range you're looking at.

My blog post above confirmed that the swapping no longer occurs in 2014 and up. To be safe, I would probably just scrap DATEDIFF from your predicate and use a different method to calculate the beginning of your range. I don't suggest the overkill of 4199 or using dynamic SQL to prevent the bad swap.