Info
My question relates to a moderately big table (~40GB data space) that is a heap
(Unfortunately, I am not allowed to add a clustered index to the table by the application owners)
An auto created statistic on an Identity column (ID
) was created, but is empty.
- Auto create stats & auto update stats are on
- Modifications have happened in the table
- There are other (auto created) statistics that are getting updated
- There is another statistic on the same column created by an index (duplicate)
- Build: 12.0.5546
The duplicate statistic is getting updated:
The actual question
To my understanding, all stats could be used and modifications are tracked, even if there are two statistics on exactly the same columns (duplicates), so why does this statistic remain empty?
Stats Info
DB stat info
Table Size
Column Information where the statistic is created on
[ID] [int] IDENTITY(1,1) NOT NULL
Identity column
select * from sys.stats
where name like '%_WA_Sys_0000000A_6B7099F3%';
Getting some info on another statistic
select * From sys.dm_db_stats_properties (1802541555, 3)
In comparison with my empty stat:
Stats + Histogram from "generate scripts":
/****** Object: Statistic [_WA_Sys_0000000A_6B7099F3] Script Date: 2/1/2019 10:18:19 AM ******/
CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000
When creating a copy of the stats, no data is inside
CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3_TEST] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000
When manually updating the stat they do get updated.
UPDATE STATISTICS [dbo].[Table]([_WA_Sys_0000000A_6B7099F3_TEST])
Best Answer
I was able to reproduce this, both with an empty statistic, and a populated statistic. I arranged for an automatic statistic to be created on an empty table, and the index was created later:
I found that modifications continue to be tracked accurately on all non-empty duplicates, but only one statistic is updated automatically (regardless of the asynchronous setting).
Automatic statistics updates only occur when the query optimizer needs a particular statistic, and finds that it is out of date (an optimality-related recompile).
The optimizer chooses from duplicate statistics as mentioned in the Plan Caching and Recompilations in SQL Server 2012 paper:
The point being that the optimizer chooses one of the available duplicate statistics (the "best one), and that one is automatically updated if found to be stale.
I believe this is a change in behaviour from older releases - or at least the documentation suggests that all out-of-date statistics for an object would be updated as part of this process, but I have no idea when this changed. It was certainly after August 2013 when Matt Bowler posted Duplicate Statistics, which contains a handy AdventureWorks-based repo. That script now results in only one of the statistics objects being updated, while at the time both were.
The above explanation matches all the behaviours I observed while attempting to reproduce your scenario, but I doubt it is explicitly documented anywhere. It does seem like a sensible optimization, since there is little value in keeping duplicates fully updated.
This is probably all at a level of detail below that which Microsoft are willing to support. This also means it could change without notice.