Sql-server – SQL server Database statistics degrading after execution

centity-frameworketlsql server

I have been experiencing timeouts while querying my SQL Server DB in particular one table. I tried updating the statistics and it fixed the performance issues for a couple of days.

I started digging deeper and found out that there is a scheduled synchronization task that retrieves data from a web-service and updates the database. This task uses Entity framework for this purpose. Several colleagues tried to fix this but so far no luck.
I wanted to take a step back.

  1. Throw EF out of the picture
  2. this should not be a piece of C#/EF code to do this but is a perfect job for an ETL tool. A delta update should (according to my understanding of DB statistics do the job). So far, every time the scheduled task runs, everything is being updated.

Can someone provide me with more information or insights in this situation for clues so that I can try to fix the code before bringing in new tools and solutions.

Best Answer

Starting at the beginning:

SQL Server uses statistics objects to generate cardinality estimates for the result set steps during the optimization phase. Cardinality estimates are used to generate a query plan, which is the logical plan by which SQL Server will physically access and retrieve the data in order to satisfy the request. Generally speaking, more accurate statistics lead to a better plan because SQL Server chooses the logical and physical operators to use based on these cardinality estimates. Good estimates give SQL Server a better chance at choosing the best operation for the amount of data it expects to process.

A SQL Server Statistics object consists of a header, a density vector, and a histogram. The density vector and the histogram are used under different circumstances when estimating cardinality.

Modifications (via INSERT, UPDATE, and DELETE) necessarily change the data profile (INSERT and DELETE by modifying the number of rows in the table itself, and UPDATE by modifying the number of rows that satisfy a condition) so any of these "writes" can ultimately affect the optimization and planning phases of query execution. For a minor number of writes, this generally isn't an issue as a few rows here-and-there won't necessarily change the operators that SQL Server uses. A notable exception is if the written data are inserts that "add on" to the existing data set (e.g. an IDENTITY column, or inserting the next day's data). This is called an ascending key and can be problematic because the values exist outside of the boundaries of the histogram and (up until SQL Server 2014) SQL Server has no knowledge of the number of rows that would satisfy a request with the new key as a filter value. Sometimes you'll hear about statistics that have taken many modifications as "stale."

Because you've not shared the nature of your scheduled synchronization task, I can't tell you why updating statistics helps, in your case. It could be ascending key, in which case the update will include the new key values in the histogram. Or it could be that you've squeezed in enough data to affect the cardinality of the result set of an extant value, but your cached plan is not fit for this new amount of data. It also depends on the queries with which you're experiencing this performance degradation. Are they queries on the new data, old data, or all data? Additionally, I've seen some poor SQL generated by Entity Framework, which adds layers of complexity to the optimization and planning phase so that when the poor SQL is combined with "stale" statistics, you get a bad plan that degrades over time (if you're lucky). So, calls to scrap EF are not bad advice, but let's stay focused on the statistics portion.

Depending on your version of SQL Server, updating statistics might cause plan cache invalidation. If you're running 2005->2008R2, invalidation will only occur for relevant plans if the AUTO_UPDATE_STATISTICS database option is enabled on the database. Additionally, for versions >= 2012, you'll get an invalidated plan after a stats update if there were any modifications to the stats objects. What this means is that blindly running a stats update (especially in 2012 and newer) won't give you a better plan if you've not modified any data.

So how can you check to see if you've modified any data? Here's a query that I use (of particular interest to you would be the ModificationCounter column):

select 
    SchemaName = sch.name,
    TableName = t.name,
    StatName = s.name,
    IndexName = i.name,
    StatLeadingColumn = index_col(quotename(sch.name)+'.'+quotename(t.name),s.stats_id,1),
    StatColumns = sc.StatColumns,
    IsFiltered = s.has_filter,
    StatFilter = s.filter_definition,
    IsAutoCreated = s.auto_created,

    StatLastUpdated = sp.last_updated,
    ModificationCounter = sp.modification_counter,
    StatRows = sp.rows,
    StatRowsSampled = sp.rows_sampled,
    StatUnfilteredRows = sp.unfiltered_rows,
    StatSteps = sp.steps
from sys.stats s 
join sys.tables t 
    on s.object_id = t.object_id
join sys.schemas sch 
    on t.schema_id = sch.schema_id
cross apply sys.dm_db_stats_properties(s.object_id,s.stats_id) sp
cross apply (
                select
                stuff(
                        (
                            select ', '+c.name
                            from sys.stats_columns scol
                            join sys.columns c
                                on scol.object_id = c.object_id
                                and scol.column_id = c.column_id
                            where scol.object_id = s.object_id
                                and scol.stats_id = s.stats_id
                            for xml path(''), type
                        ).value('.','nvarchar(max)')
                        ,1,2,''
                    ) StatColumns
            ) sc
left join sys.indexes i 
    on s.object_id = i.object_id
    and s.name = i.name

In order to run the above, you're going to need to be a member of the db_owner or db_ddladmin fixed database role. The sysadmin server role can obviously run this as well, but I don't generally advocate for randomly adding folks to that server role.

To check to see if a filter value is in the histogram, you can run

DBCC SHOW_STATISTICS('dbo.TableName','StatisticsName') WITH HISTOGRAM;

That will output the histogram step boundary values (RANGE_HI_KEY). If the filter is greater than the last value for this column, then it's outside of the boundaries of the table and you are at risk for cardinality estimates that are quite incorrect.

There's also a caveat to the AUTO_UPDATE_STATISTICS behavior: it updates based on the number of rows in the table.

  • If a table has 0 rows, the statistics will be "stale" (invalidated) when any rows are added.

  • If the table has <= 500 rows (when statistics were last collected), the statistics will be invalidated when data has changed on the leading column of the statistic by more than 500.

  • If the table has > 500 rows (when statistics were last collected), statistics will be invalidated when there are modifications to the leading column of (500 + 20% of table cardinality).

When these thresholds are met, the statistic is invalidated and the relevant query plan(s) are invalidated. The next time a query is executed, SQL Server will see the invalidation and attempt to load statistics to come up with a new plan. If the statistics are also invalidated, SQL Server will sample the data at the default rate before proceeding to the next step in optimization and planning. You can work around this by enabling the AUTO_UPDATE_STATISTICS_ASYNC database option, but then you're stuck with the invalid query plan for another execution.

Trace Flag 2371 has also been suggested as a potential fix. This changes the threshold above from a stepped linear threshold to a logarithmic threshold, triggering an update when the modification counter hits SQRT(1000*Table Cardinality) (buried in this whitepaper).