Sql-server – Auto Update Statistics in SQL Server 2008R2: Why do some statistics remain stale despite a large number of row inserts

performancesql serversql-server-2008

During investigation of a slow query it appeared that the execution plan was exceptionally suboptimal (A nested loop performing 9 million executions of a seek where the estimated number of executions was 1). Having confirmed that some relevant statistics where indeed out-of-date I rebuilt the stats and the performance problem is effectively resolved.

This database has Auto Update Statistics enabled (on by default). I understand there is a threshold for auto stats updates based on there being 20% + 500 row modifications (update/insert/deletes). This threshold appears to have been exceeded by a large degree on multiple indexes, as such it appears there is either (A) a problem with auto updates or (B) There is more to the update strategy than I have been able to find in online documentation.

I appreciate that a scheduled task can be set up to update stats and this is likely to be the approach that we take if no other solution can be found, but it does leave us confused as to why such large numbers of modifications would not trigger an auto update for some stats – understanding why might help us to decide which stats need to be updated by a scheduled task.

Some additional notes:

  1. The problem was noted in a database where the data is being created by a load test and as such a large quantity of data is being added in a short space of time, thus if the auto update occurred periodically (e.g. once a day at most) then this may explain some of the observed behavior. Also our load tests tend to stress the database heavily, therefore I wonder if SQL is deferring stats updates while there is heavy load (and subsequently not updating the stats for some reason).

  2. In trying to recreate this issue with a test script containing successive INSERT, SELECT and DELETE statements the problem to did not occur. I'm wondering if the distinction here is that these statements each affect many rows per SQL statement, whereas our load test script will tend to insert rows individually.

  3. The DB in question is set to the 'Simple' recovery model.

Some relevant links:

I have also raised this issue via microsoft connect:

UPDATE 2011-06-30:

On further investigation I believe that stats that are out-of-date beyond the thresholds levels (e.g. 500 rows + 20%) are stats that aren't being utilized by the problem query, hence they probably will be updated when a query is run that requires them. For the stats that are used by the query, these are being updated regularly. The remaining issue then is that these stats are grossly misleading to the query plan optimizer after only a relatively few inserts (e.g causing the aforementioned 9 million or so seeks where the estimated number was 1).

My hunch at this time is that the problem is related to a poor choice of primary key, the key is a unique identifier created using NEWID(), and this therefore creates a highly fragmented index very quickly – especially as the default fill factor in SQL Server is 100%. My hunch is that this somehow results in misleading stats after relatively few row inserts – less then the threshold for recalculating the stats. This is all possibly a non-issue as I have generated a lot of data without rebuilding indexes part way through, hence the poor stats may be a consequence of the resultant very high index fragmentation. I think I need to add SQL Server maintenance cycles into my load test to get a better idea of performance on a real system over long periods of time.

UPDATE 2012-01-10:

Another factor to consider. Two trace flags were added to SQL Server 2005 (and appear to be still present in 2008) to address specific shortcomings related to the occurrence of outdated and/or misleading statsistics. The flags in question are:

DBCC TRACEON(2389)
DBCC TRACEON(2390)

MSDN: Ian Jose's WebLog: Ascending Keys and Auto Quick Corrected Statistics
Statistics on Ascending Columns, Fabiano Amorim

You should of course be very careful when deciding to enable these flags as they may have detrimental effects.

Best Answer

Some info, if not a definitive answer

It's been blogged recently

There is a whitepaper too. See the section "Maintaining Statistics in SQL Server 2008" where there are some conditions that sound like affect you. Example:

One limitation of the automatic update logic is that it tracks changes to columns in the statistics, but not changes to columns in the predicate. If there are many changes to the columns used in predicates of filtered statistics, consider using manual updates to keep up with the changes.

At the end there are some settings to check too: what if OFF at the DB level which overrides an ON at the index/stat level?

HTH...