Consider the scenario where auto created statistics are created on a column to satisfy a query.
Afterwards; a nonclustered index is created on the column, with or without other key columns.
Could the SQL Server optimizer leverage indexes to update these auto created statistics when running UPDATE STATISTICS
statements?
Is there a difference between (default) sample rates & fullscan statistic updates when adding these indexes?
SELECT @@VERSION;
Microsoft SQL Server 2017 (RTM-CU12) (KB4464082) - 14.0.3045.24 (X64) Oct 18 2018 23:11:05 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Windows Server 2016 Datacenter 10.0 <X64> (Build 14393: ) (Hypervisor)
Best Answer
TL;DR
Auto created statistics will be improved by an index if you are using fullscan updates & have the column as the first key column in an index.
(Default) sample rate statistic updates will use the clustered index / heap table scan even when a NCI exists with the column as the first key column.
Even when updating the statistic of the NCI.
As such, updating statistics with a sample rate will not be positively impacted by adding an index.
Setup
Creating a table with a primary key & 10M rows:
A
nchar(500)
value is used to show a clear difference between statistic updates.We can create an auto created statistic on the val column with this query:
Full scan stat updates
Updating the statistic with fullscan:
Results in 127 seconds of cpu tme on my machine:
Now when we look at the query that the stat updates actually runs:
We can see that it went parallel on the two cores on my machine & did order by the
val
column.As a side note, there was also a memory grant & tempdb was used to support the ordering, The memory grant was as high as 12484 MB or ~ 12 GB
Let's see what happens when we create this index:
And re-run the stat update:
This results in an execution time of
And the memory grant & tempdb usage is gone since it does not have to sort anymore.
The statistic update with fullscan on the auto created stat is able to use the index.
This would also work for an index like this:
But not when creating an index like this:
Sample rates
If we change the update stats to sample 10 percent instead of doing a fullscan:
Without an index this amounts to
With an index this amounts to
Auto created statistics with sample rates are not positively impacted by indexes on the column where the column is the first key column. Why this happens is explained further below
Comparison with updating the index' stats
As expected, updating the statistic on the index takes the same amount of time as the auto created statistics.
Sampled statistic update on an indexed column stat
The query used when updating statistics with a sample rate of 10 percent is different:
TABLESAMPLE
will give us random pages and it's explanation can be found here.The statistic orders by
val,id,step_direction([SC0]) over (order by NULL)
We can trace the query plan of this
StatMan
query with the profiler & theShowplan XML Statistics Profile
event:The execution plan shows the clustered index being used, instead of the expected nonclustered index:
I cannot share the query plan with paste the plan unfortunately, it is malformed.
The same is true for heap tables. The usage of the NCI would not make a difference here, the
TABLESAMPLE SYSTEM (9.999986e+000 PERCENT)
retrieves 10 percent of the data pages in a random order. This is why the CI was used.