Sql-server – Incremental statistics: find out which partitions have been resampled

dmvpartitioningsql serversql server 2014statistics

TL;DR: Is it possible to find out which partition(s) have been resampled and which have not when incremental statistics are used? Platform is SQL Server 2014 Enterprise edition.

The long version with some background information is like so.

Assuming a fairly typical DW environment, there is a partitioned table. The partitioning is based on a date column. This is done as staging data is loaded on separate table and after pre-processing, partition switching is used to move data into the production fact table. Oh, and a clustered columnstore index is in use. There are about a thousand partitions used. The DB is running on a virtual machine.

There are some 7.5 gigarows (100 GB) in the fact table. Daily growth is about five megarows. This is way too small a growth rate to trigger automatic statistics update, save trace flag 2371 (which haven't been tried).

The knee-jerk developer reaction to outdated statistics was updating them. For 7.5 gigarows, full update for all the stats takes some five hours. For a single statistics update, the processing performance is around 20 minutes or 90 megarows per second.

As the system is on a VM platform, business rules limit its costs. Neither memory nor IOPS are easily increased. Five hours update job is way too slow to be included on nightly ETL process, so either the stats stay outdated, are updated on unexpected time or will be updated in a maintenance window.

As the SQL Server is version 2014 Enterprise edition, it supports incremental statistics which sound just like the solution. After converting the stats to incremental ones, processing a single stat for single partition takes only 20 seconds. Grand total for newly-switched partition is about five minutes. This sounds great and certainly fits in the ETL process.

The thing that I wonder is how to manage incremental stats in partition switched environment. Assuming the stats are converted and updated as incremental on date D, how does one find out unprocessed partitions on, say, date D+2? Updating stats in ETL process is trivial, as the switching process obviously is aware of the partition id. But if there are partitions that are not resampled, how does one find those?

  • Last update for stats can be found from sys.dm_db_stats_properties
  • Partition numbers are available from sys.partitions
  • Partitioning function values are in sys.partition_range_values

One could pick last update date L for a stat and compare it to today's date T. Then calculate which partition id L points and wheter it is the same as T's. Then proceed with update for all the partition ids [L, T). This sounds tricky and error prone, so is there better a way? A DMV that shows which partitions are used for resampling would be nice, but there isn't one, is there?

Best Answer

My first question would be to ask why you're actually using incremental in the first place. Here's an answer that I posted regarding incremental statistics, a blog post by Erin Stellato that illuminates one of the primary complaints and pitfalls with incremental statistics (they're not used at the partition level by the optimizer), and two blog posts by me that work through evaluating any potential use case for incremental statistics.

Having said that, to know when a partition's stats have been sampled you can use an undocumented DMF (sys.dm_db_stats_properties_internal()) to get partition-level information. I have a comment on this blog post that describes how to understand the hierarchy at a fairly high level.

select 
    sysdatetime(),                          
    schema_name = sh.name,
    table_name = t.name,
    stat_name = s.name,
    index_name = i.name,
    leading_column = index_col(quotename(sh.name)+'.'+quotename(t.name),s.stats_id,1),
    s.stats_id,
    parition_number = isnull(sp.partition_number,1),
    s.has_filter,                       
    s.is_incremental,
    s.auto_created,
    sp.last_updated,    
    sp.rows,
    sp.rows_sampled,                        
    sp.unfiltered_rows,
    modification_counter = coalesce(sp.modification_counter, n1.modification_counter) 
from sys.stats s 
join sys.tables t 
    on s.object_id = t.object_id
join sys.schemas sh
    on t.schema_id = sh.schema_id
left join sys.indexes i 
    on s.object_id = i.object_id
        and s.name = i.name
cross apply sys.dm_db_stats_properties_internal(s.object_id, s.stats_id) sp
outer apply sys.dm_db_stats_properties_internal(s.object_id, s.stats_id) n1
where n1.node_id = 1
    and (
            (is_incremental = 0)
               or
            (is_incremental = 1 and sp.partition_number is not null)
         )
    and t.name = '<<TABLENAME>>'
    and s.name like '<<STATNAME>>%'
order by s.stats_id,isnull(sp.partition_number,1);