Sql-server – Query behavior – with regard to statistics

azure-sql-databaseindex-statisticssql serverstatistics

A quick background of the issue: We have an application, and we have many instances of this application running for clients. While they may be on slightly different versions, they are fundamentally the same.

Yesterday, one client had an issue with SQL timeouts. Looking at the query we identified an issue with certain tables, and the use of OUTER APPLY and re-wrote it to circumvent the issue.

Inspecting the query plan today, I can clearly see that the stats are bad, as it is expecting circa 2.5 million rows, which is incorrect. I updated the stats and it has resolved the issue, and now expects 30 rows.

My confusion comes from when I inspect the query plan for other clients' databases, and the stats seem off, but, the query is returned in around 1 second, not the 45 seconds seen in the issue that was faced.

Both databases have auto stats turned on. Does this suggest a problem with auto stats on the problem database?

Whilst testing, I did clear the cache DBCC FREEPROCCACHE so the engine had to generate a plan each time. I have not done this on a database that was returning the data in a timely fashion however.

Sorry for the vagueness, I unfortunately cannot share the query plan, due to sensitive information.

Currently, we run only automatic stats updates (no scheduled statistics / index maintenance). This will change; the databases have been neglected somewhat. I should also mention, these databases are in Azure. I am not sure if that changes anything?

Best Answer

Auto-stats and stats per se is only one aspect. Another is the plan, the calculated selectivity, and also the fact that plans can be cached.

So imagine a plan was generated when stats were "ok" (along with other circumstances such as sniffed parameter values). This can now stick around for a while in cache, and even if the stats deteriorate (a bit), your "good plan" can linger in cache.

Then you have the aspect that autostats samples data. The larger the dataset, the smaller the part of that data set that is sampled. So even when auto-stats kicks in, with a larger dataset you may get lower quality stats.