Sql-server – Why are the statistics getting out of date so quickly

azure-sql-databasesql serverstatistics

I've got a large table (300 million rows), with about 1 million rows added per day. This is time-series data, so I have the following setup:

CREATE TABLE Readings (
    Id INT PRIMARY KEY,
    TrackerId INT,
    DateProcessed DATETIME,
    -- About 10 other columns of data
)

CREATE INDEX IX_TrackerId_DateProcessed
ON Readings (
    TrackerId,
    DateProcessed
) INCLUDE (
    -- all the other columns
) WHERE (
    TrackerId IS NOT NULL
)

When we view the data, I usually run something like the following query:

SELECT Latest.*
FROM Trackers
CROSS APPLY (
    SELECT TOP 1 *
    FROM Readings
    WHERE TrackerId = Trackers.Id
    ORDER BY DateProcessed DESC
) Latest

This used to work fine. Now, it takes forever to generate the query plan, even when I explicitly tell it to use the index. The only solution I have found so far is to update the statistics for the table with the following command:

UPDATE STATISTICS Readings

This takes about ten minutes to run, and then the select query runs fine, but only for a day. Then I have to update the statistics again. Why are the statistics getting out of date so quickly, and what can I do to prevent this?

Points worth mentioning:

  • I only ever append to the table, I never update or delete rows
  • There are some old rows with a null value for tracker id, but no more will be added
  • The database is running at compatibility level 14
  • I can't enable trace flag 2371 in Azure SQL

Additional answers to questions in the comments

The estimated query plan before updating the statistics

The estimated query plan after updating the statistics

The plans before and after updating the statistics look identical to me. The issue is not that I am getting a bad query plan, but how long it takes to get the plan. Before updating the statistics, I had to wait two minutes for the plan. After updating the statistics, it got the plan instantly.

While writing this, I just realized that it would actually be quicker to wait for the query, and let the cache do the work, rather than update the statistics, but that's still not solving the problem. This query is executed from within Entity Framework, with a timeout of 30 seconds, so two minutes is still too long.

Best Answer

Time spent automatically updating statistics is included in compile time. The stats update takes 10 minutes, and the compile time for a query that triggers an auto stats update includes that time - which is why the query takes over 10 minutes every now and then.

The modification threshold for an automatic stats update, per the documentation, is

sqrt(table cardinality * 1000)

With your numbers (300,000,000 rows), you hit that threshold at 547,722 modifications.

Since 1,000,000 rows are being added every day, you're bound to hit an auto stats update around twice a day.

Since your query's estimates are good before and after the stats update, one possible solution would be to enable the database level AUTO_UPDATE_STATISTICS_ASYNC setting. This still let's SQL Server update stats automatically, but it's done asynchronously (so queries don't have to wait on it to happen):

ALTER DATABASE [Live] 
SET AUTO_UPDATE_STATISTICS_ASYNC ON;

Note that this affects the entire database, not just that table / statistic.