Sql-server – Filtered index statistics refresh threshold

index-tuningsql-server-2008

We have filtered indexes in our production environment. While doing some research about them, I came across this article "Filtered indexes and filtered stats might become seriously out-of-date"

It's a fairly simple filtered index based on a code value of 0

CREATE NONCLUSTERED INDEX 
    [IX_InsuranceOffer_FIX_OfferCode0] 
    ON [dbo].[InsuranceOffer]
(
    [OfferId] ASC
)
WHERE ([OfferStatus]=(0))
WITH (PAD_INDEX = OFF,   STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF,  DROP_EXISTING = OFF
, ONLINE = OFF,          ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

distribution looks like

code   codeCount   code_distribution
------ ----------- -----------------
6      26186769    93.7526
0      1743401     6.2416
5      1107        0.0040
7      495         0.0018

Our intention was to modify the existing index to include code 5. Based on this tipping point article, I believe both queries should continue to use the filtered index.

I have questions out to the system owners trying to understand the volatility of the codes.

Until then, I looked at sys.dm_db_index_physical_stats an attempt to understand whether our current index rebuild/reorganization strategy is sufficient for keeping up with the filtered index. I suspect it isn't but my internal-fu is weak.

index_level avg_fragmentation_in_percent            fragment_count       avg_fragment_size_in_pages page_count           avg_page_space_used_in_percent          record_count
----------- --------------------------------------- -------------------- -------------------------- -------------------- --------------------------------------- --------------------
0           0.6276                                  20                   143.4                      2868                 90.0984                                 1743401
1           42.8571                                 7                    1                          7                    86.0285                                 2868
2           0.0000                                  1                    1                          1                    1.4455                                  7

sys.stats for this index shows it was last updated '2012-01-06 22:03:11.147'

Is the above data sufficient information to base an index rebuild decision upon or would I need to have additional metrics involved? Or, for filtered indexes do we even care about fragmentation and should just explicitly update statistics at X interval?

I claim the answer "it depends"

Only semi-related question was Minimum rowcount for filtered index?

Best Answer

You've got two questions in here:

Is the above data sufficient information to base an index rebuild decision upon or would I need to have additional metrics involved?

One missing link is the size of the index. If you're talking about an object with less than, say, 1000 pages, then index rebuilds aren't all that critical.

Another missing link would be the churn of the index. Typically I see filtered indexes used when they're a very, very small subset of the entire table, and the subset changes fast. Guessing by the name of your filtering field (OfferStatus = 0), it sounds like you're indexing just the rows where you haven't made an offer yet, and then you're going to immediately turn around and make an offer. In situations like that, the data's changing so fast that index rebuilds usually don't make sense.

Or, for filtered indexes do we even care about fragmentation and should just explicitly update statistics at X interval?

SQL Server updates stats on objects when ~20% of the data changes, but filtered indexes & stats are a special case. They're also updated when 20% of the data changes - but it's 20% of the base table, not 20% of the filtered subset. Because of that, you probably want to manually update stats on them periodically. I love Ola Hallengren's maintenance scripts for this - the index maintenance stored proc has a parameter for updating statistics, and another parameter for choosing what level of sampling you want, and another parameter for choosing whether to update stats on all objects or only the ones with changed rows. It's fantastic.

Related Solutions

Sql-server – Minimum rowcount for filtered index

For example, if you have 100,000 rows in a table that gets, say, 75% read operations, would it be wise to add a filtered index that only covers 500 rows? Or 100 rows? What if the filter covers 85,000 of those rows?

That would depend on the queries against the table. If query patterns are such that they target a subset of the 500 or 100 rows that your filtered index covers, perfect. On the flip side, it's unlikely that the optimiser is going to choose a filtered index that includes 85% unless it is a covering index for a particular query.

I haven't tested this but I would expect filtered index utilisation to exhibit the same tipping point behaviour as non-filtered indexes.

Sql-server – Does a re-index update statistics

You can keep the following in mind when caring about updating statistics (copied from Rebuilding Indexes vs. Updating Statistics (Benjamin Nevarez)

By default, the UPDATE STATISTICS statement uses only a sample of records of the table. Using UPDATE STATISTICS WITH FULLSCAN will scan the entire table.
By default, the UPDATE STATISTICS statement updates both index and column statistics. Using the COLUMNS option will update column statistics only. Using the INDEX option will update index statistics only.
Rebuilding an index, for example by using ALTER INDEX … REBUILD will also update index statistics with the equivalent of using WITH FULLSCAN unless the table is partitioned, in which case the statistics are only sampled (applies to SQL Server 2012 and later).
Statistics that were manually created using CREATE STATISTICS are not updated by any ALTER INDEX ... REBUILD operation, including ALTER TABLE ... REBUILD. ALTER TABLE ... REBUILD does update statistics for the clustered index, if one is defined on the table being rebuilt.
Reorganizing an index, for example using ALTER INDEX … REORGANIZE does not update any statistics.

The short answer is that you need to use UPDATE STATISTICS to update column statistics and that an index rebuild will update only index statistics. You can force an update to all statistics on a table, including index-stats and manually created stats, with the UPDATE STATISTICS (tablename) WITH FULLSCAN; syntax.

The following code illustrates the rules encapsulated above:

First, we'll create a table with a couple of columns, and a clustered index:

USE tempdb;

IF OBJECT_ID(N'dbo.SomeTable', N'U') IS NOT NULL
DROP TABLE dbo.SomeTable;

CREATE TABLE dbo.SomeTable
(
    rn int NOT NULL IDENTITY(1,1)
        CONSTRAINT pk
        PRIMARY KEY NONCLUSTERED
    , i int NOT NULL INDEX i 
    , d sysname NOT NULL
) ON [PRIMARY] WITH (DATA_COMPRESSION = NONE);

CREATE UNIQUE CLUSTERED INDEX cx ON dbo.SomeTable (i, d);

CREATE STATISTICS d ON dbo.SomeTable (d) WITH FULLSCAN;

INSERT INTO dbo.SomeTable (d, i)
SELECT c1.name, c1.id
FROM sys.syscolumns c1;

This query shows the date when each stats object was last updated:

SELECT ObjectName = sc.name + N'.' + o.name
    , StatsName = s.name
    , StatsDate = STATS_DATE(s.object_id, s.stats_id)
FROM sys.stats s
    INNER JOIN sys.objects o ON s.object_id = o.object_id
    INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
WHERE sc.name = N'dbo'
    AND o.name = N'SomeTable';

The results show no updates have yet taken place, which is correct since we just created the table:

╔═══════════════╦═══════════╦═══════════╗
║  ObjectName   ║ StatsName ║ StatsDate ║
╠═══════════════╬═══════════╬═══════════╣
║ dbo.SomeTable ║ cx        ║ NULL      ║
║ dbo.SomeTable ║ i         ║ NULL      ║
║ dbo.SomeTable ║ pk        ║ NULL      ║
║ dbo.SomeTable ║ d         ║ NULL      ║
╚═══════════════╩═══════════╩═══════════╝

Let's rebuild the entire table, and see if that updates stats:

ALTER TABLE dbo.SomeTable REBUILD;

SELECT ObjectName = sc.name + N'.' + o.name
    , StatsName = s.name
    , StatsDate = STATS_DATE(s.object_id, s.stats_id)
FROM sys.stats s
    INNER JOIN sys.objects o ON s.object_id = o.object_id
    INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
WHERE sc.name = N'dbo'
    AND o.name = N'SomeTable';

╔═══════════════╦═══════════╦═════════════════════════╗
║  ObjectName   ║ StatsName ║        StatsDate        ║
╠═══════════════╬═══════════╬═════════════════════════╣
║ dbo.SomeTable ║ cx        ║ 2018-09-17 14:09:13.590 ║
║ dbo.SomeTable ║ i         ║ NULL                    ║
║ dbo.SomeTable ║ pk        ║ NULL                    ║
║ dbo.SomeTable ║ d         ║ NULL                    ║
╚═══════════════╩═══════════╩═════════════════════════╝

The results show only the clustered index stats were updated.

Next, we perform a discrete UPDATE STATS operation:

UPDATE STATISTICS dbo.SomeTable(d) WITH FULLSCAN;

SELECT ObjectName = sc.name + N'.' + o.name
    , StatsName = s.name
    , StatsDate = STATS_DATE(s.object_id, s.stats_id)
FROM sys.stats s
    INNER JOIN sys.objects o ON s.object_id = o.object_id
    INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
WHERE sc.name = N'dbo'
    AND o.name = N'SomeTable';

As you can see, we've just updated the stats on the d column:

╔═══════════════╦═══════════╦═════════════════════════╗
║  ObjectName   ║ StatsName ║        StatsDate        ║
╠═══════════════╬═══════════╬═════════════════════════╣
║ dbo.SomeTable ║ cx        ║ 2018-09-17 14:09:13.590 ║
║ dbo.SomeTable ║ i         ║ NULL                    ║
║ dbo.SomeTable ║ pk        ║ NULL                    ║
║ dbo.SomeTable ║ d         ║ 2018-09-17 14:09:13.597 ║
╚═══════════════╩═══════════╩═════════════════════════╝

Now, we'll update stats on the entire table:

UPDATE STATISTICS dbo.SomeTable WITH FULLSCAN;

SELECT ObjectName = sc.name + N'.' + o.name
    , StatsName = s.name
    , StatsDate = STATS_DATE(s.object_id, s.stats_id)
FROM sys.stats s
    INNER JOIN sys.objects o ON s.object_id = o.object_id
    INNER JOIN sys.schemas sc ON o.schema_id = sc.schema_id
WHERE sc.name = N'dbo'
    AND o.name = N'SomeTable';

╔═══════════════╦═══════════╦═════════════════════════╗
║  ObjectName   ║ StatsName ║        StatsDate        ║
╠═══════════════╬═══════════╬═════════════════════════╣
║ dbo.SomeTable ║ cx        ║ 2018-09-17 14:09:13.600 ║
║ dbo.SomeTable ║ i         ║ 2018-09-17 14:09:13.600 ║
║ dbo.SomeTable ║ pk        ║ 2018-09-17 14:09:13.603 ║
║ dbo.SomeTable ║ d         ║ 2018-09-17 14:09:13.607 ║
╚═══════════════╩═══════════╩═════════════════════════╝

As you can see, the only way to be certain all the stats are updated is to either update each one manually, or to update the entire table with UPDATE STATISTICS (table);.

Best Answer

Related Solutions

Sql-server – Minimum rowcount for filtered index

Sql-server – Does a re-index update statistics

Related Question