SQL Server Index Maintenance – How to Best Manage Index Builds with Stats Updates

index-maintenanceola-hallengrensql serverstatistics

This is a follow-up to what I read on Is there any reason to stop transaction log backups during a maintenance window? in the answer proposed by sp_BlitzErik.

I use Ola Hallengren's indexing script and have the settings specified as per the below. I run this once a week through an Agent job.

@Databases nvarchar(max),
@FragmentationLow nvarchar(max) = NULL,
@FragmentationMedium nvarchar(max) = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationHigh nvarchar(max) = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationLevel1 int = 5,
@FragmentationLevel2 int = 30,
@PageCountLevel int = 500,

I know as apart of an Index Re-build an update stats is done automatically, but I have the following database properties set against all my databases which I believe should take care of updating stats:

Auto Create Statistics = True
Auto Update Statistics = True
Auto Update Statistics Asynchronously = True

What is generally best practice here for these settings and regularly updating stats though? Should you update stats on a nightly basis? I'm not sure how to measure whether stats should be updated or not which is why I have those database properties set.

I see in the answer by sp_BlitzErik he mentions "You're still going to want to update stats regularly though, and you can do that with this command:"

with the following command, but update stats regularly is very general.

EXECUTE dbo.IndexOptimize @Databases = 'USER_DATABASES',
                          @FragmentationLow = NULL,
                          @FragmentationMedium = NULL,
                          @FragmentationHigh = NULL,
                          @UpdateStatistics = 'ALL',
                          @OnlyModifiedStatistics = 'Y',
                          @StatisticsSample = NULL,
                          @LogToTable = 'Y';

Best Answer

I use Ola Hallengren's indexing script and have the settings specified as per the below. I run this once a week through an Agent job.

Let's break down the command you're running:

@FragmentationLevel1 int = 5,
@FragmentationLevel2 int = 30,
@PageCountLevel int = 500,

Even once a week, this is absurdly aggressive. I know the 5/30% comes from ancient Microsoft advice, but it's just that --ancient. It's time to move on.

You're taking the time to defragment indexes that are 500 8KB pages. That's a 4MB table. If your hardware has issues reading 4MB into memory, or keeping a 4MB table in memory, the answer isn't index maintenance. That's why I crank this up to at least 5000.

Modern hardware like SANs, SSDs, and non-32-bit servers that can house amounts of RAM > 4GB simply don't have the same data access issues that spinning platter drives from the early 2000s had.

We're talking record players vs. CDs.

Index rebuilds should happen extremely infrequently, and only to either correct a problem with an index, or change a setting. Here's the thing: 30% fragmentation isn't a real problem. It's just something DBAs do because they've been told to, and it's something they can measure.

So here's my question to you: how much time and resources does your server expend on index maintenance, and how much does that reduce time and resources consumed by queries?

If you can measure that, you can rebuild and reorg as much as you want.

... but I have the following database properties set against all my databases which I believe should take care of updating stats...

Kinda sorta. Auto-update stats (even async) happens when 20% of the table data + 500 rows (assuming the table is > 500 rows) changes. If you have a table with a million rows in it, that's 200k rows. You can dynamically decrease that threshold by using Trace Flag 2371, but auto update stats uses the default sampling algorithm, which may not be enough for tables with heavily skewed data.

but update stats regularly is very general.

Well, yeah, I'm answering questions about a server I've never looked at. I prefer nightly stats updates, but I've seen servers that needed them more regularly than that.

So what should you do?

Start by dialing back index maintenance to the commands I posted in the question you're referencing

If you run into problems, decrease fragmentation thresholds until they stop. If no one says anything, run index maintenance less frequently, until maybe, just maybe, you never run it at all.
Update stats nightly

Start with the default threshold. If you find that the full stats update that occurs with a rebuild is necessary, use the CommandLog table to find which tables and indexes were regularly being rebuilt, and start focusing on those. They'll typically be indexes for "big" tables, with tens of millions of rows.

This is about as specific as I can get about a server I've never seen. You're going to have to take the experimentation from here.

Background

Data for the statistics object are gathered using a statement of the form:

SELECT 
    StatMan([SC0], [SC1], [SB0000]) 
FROM 
(
    SELECT TOP 100 PERCENT 
        [SC0], [SC1], STEP_DIRECTION([SC0]) OVER (ORDER BY NULL) AS [SB0000]
    FROM 
    (
        SELECT 
            [TextValue] AS [SC0], 
            [Id] AS [SC1] 
        FROM [dbo].[Test] 
            TABLESAMPLE SYSTEM (2.223684e+001 PERCENT) 
            WITH (READUNCOMMITTED) 
    ) AS _MS_UPDSTATS_TBL_HELPER 
    ORDER BY 
        [SC0], 
        [SC1], 
        [SB0000] 
) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 1)

You can collect this statement with Extended Events or Profiler (SP:StmtCompleted).

Statistics generation queries often access the base table (rather than a nonclustered index) to avoid the clustering of values that naturally occurs on nonclustered index pages.

The number of rows sampled depends on the number of whole pages selected for sampling. Each page of the table is either selected or it is not. All rows on selected pages contribute to the statistics.

Random numbers

SQL Server uses a random number generator to decide if a page qualifies or not. The generator used in this instance is the Lehmer random number generator with parameter values as shown below:

X_next = X_seed * 7⁵ mod (2³¹ - 1)

The value of X_seed is computed as the sum of:

The low integer part of the (bigint) base table's partition_id e.g.

SELECT
    P.[partition_id] & 0xFFFFFFFF
FROM sys.partitions AS P
WHERE
    P.[object_id] = OBJECT_ID(N'dbo.Test', N'U')
    AND P.index_id = 1;

The value specified in the REPEATABLE clause
- For sampled UPDATE STATISTICS, the REPEATABLE value is 1.
- This value is exposed in the m_randomSeed element of the access method's internal debugging information shown in execution plans when trace flag 8666 is enabled, for example <Field FieldName="m_randomSeed" FieldValue="1" />

For SQL Server 2012, this calculation occurs in sqlmin!UnOrderPageScanner::StartScan:

mov     edx,dword ptr [rcx+30h]
add     edx,dword ptr [rcx+2Ch]

where memory at [rcx+30h] contains the low 32 bits of the partition id and memory at [rcx+2Ch] contains the REPEATABLE value in use.

The random number generator is initialized later in the same method, calling sqlmin!RandomNumGenerator::Init, where the instruction:

imul    r9d,r9d,41A7h

...multiplies the seed by 41A7 hex (16807 decimal = 7⁵) as shown in the equation above.

Later random numbers (for individual pages) are generated using the same basic code inlined into sqlmin!UnOrderPageScanner::SetupSubScanner.

StatMan

For the example StatMan query shown above, the same pages will be collected as for the T-SQL statement:

SELECT 
    COUNT_BIG(*) 
FROM dbo.Test AS T 
    TABLESAMPLE SYSTEM (2.223684e+001 PERCENT)  -- Same sample %
    REPEATABLE (1)                              -- Always 1 for statman
    WITH (INDEX(0));                            -- Scan base object

This will match the output of:

SELECT 
    DDSP.rows_sampled
FROM sys.stats AS S
CROSS APPLY sys.dm_db_stats_properties(S.[object_id], S.stats_id) AS DDSP
WHERE 
    S.[object_id] = OBJECT_ID(N'dbo.Test', N'U')
    AND S.[name] = N'IX_Test_TextValue';

Edge case

One consequence of using the MINSTD Lehmer random number generator is that seed values zero and int.max should not be used as this will result in the algorithm producing a sequence of zeroes (selecting every page).

The code detects zero, and uses a value from the system 'clock' as the seed in that case. It does not do the same if the seed is int.max (0x7FFFFFFF = 2³¹ - 1).

We can engineer this scenario since the initial seed is calculated as the sum of the low 32 bits of the partition id and the REPEATABLE value. The REPEATABLE value that will result in the seed being int.max and therefore every page being selected for sample is:

SELECT
    0x7FFFFFFF - (P.[partition_id] & 0xFFFFFFFF)
FROM sys.partitions AS P
WHERE
    P.[object_id] = OBJECT_ID(N'dbo.Test', N'U')
    AND P.index_id = 1;

Working that into a complete example:

DECLARE @SQL nvarchar(4000) = 
    N'
    SELECT
        COUNT_BIG(*) 
    FROM dbo.Test AS T 
        TABLESAMPLE (0 PERCENT) 
        REPEATABLE (' +
        (
            SELECT TOP (1)
                CONVERT(nvarchar(11), 0x7FFFFFFF - P.[partition_id] & 0xFFFFFFFF)
            FROM sys.partitions AS P
            WHERE
                P.[object_id] = OBJECT_ID(N'dbo.Test', N'U')
                AND P.index_id = 1
        ) + ')
        WITH (INDEX(0));';

PRINT @SQL;
--EXECUTE (@SQL);

That will select every row on every page whatever the TABLESAMPLE clause says (even zero percent).

Best Answer

Related Solutions

Sql-server – Index Reorgs Blocking Statistics Updates Blocking Inserts

SQL Server 2012 – Strange Behavior with Sample Sizes for Statistics Updates

Background

Random numbers

StatMan

Edge case

Related Question