Sql-server – Filtered index statistics refresh threshold

index-tuningsql-server-2008

We have filtered indexes in our production environment. While doing some research about them, I came across this article "Filtered indexes and filtered stats might become seriously out-of-date"

It's a fairly simple filtered index based on a code value of 0

CREATE NONCLUSTERED INDEX 
    [IX_InsuranceOffer_FIX_OfferCode0] 
    ON [dbo].[InsuranceOffer]
(
    [OfferId] ASC
)
WHERE ([OfferStatus]=(0))
WITH (PAD_INDEX = OFF,   STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF,  DROP_EXISTING = OFF
, ONLINE = OFF,          ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

distribution looks like

code   codeCount   code_distribution
------ ----------- -----------------
6      26186769    93.7526
0      1743401     6.2416
5      1107        0.0040
7      495         0.0018

Our intention was to modify the existing index to include code 5. Based on this tipping point article, I believe both queries should continue to use the filtered index.

I have questions out to the system owners trying to understand the volatility of the codes.

Until then, I looked at sys.dm_db_index_physical_stats an attempt to understand whether our current index rebuild/reorganization strategy is sufficient for keeping up with the filtered index. I suspect it isn't but my internal-fu is weak.

index_level avg_fragmentation_in_percent            fragment_count       avg_fragment_size_in_pages page_count           avg_page_space_used_in_percent          record_count
----------- --------------------------------------- -------------------- -------------------------- -------------------- --------------------------------------- --------------------
0           0.6276                                  20                   143.4                      2868                 90.0984                                 1743401
1           42.8571                                 7                    1                          7                    86.0285                                 2868
2           0.0000                                  1                    1                          1                    1.4455                                  7

sys.stats for this index shows it was last updated '2012-01-06 22:03:11.147'

Is the above data sufficient information to base an index rebuild decision upon or would I need to have additional metrics involved? Or, for filtered indexes do we even care about fragmentation and should just explicitly update statistics at X interval?

I claim the answer "it depends"

Only semi-related question was Minimum rowcount for filtered index?

Best Answer

You've got two questions in here:

Is the above data sufficient information to base an index rebuild decision upon or would I need to have additional metrics involved?

One missing link is the size of the index. If you're talking about an object with less than, say, 1000 pages, then index rebuilds aren't all that critical.

Another missing link would be the churn of the index. Typically I see filtered indexes used when they're a very, very small subset of the entire table, and the subset changes fast. Guessing by the name of your filtering field (OfferStatus = 0), it sounds like you're indexing just the rows where you haven't made an offer yet, and then you're going to immediately turn around and make an offer. In situations like that, the data's changing so fast that index rebuilds usually don't make sense.

Or, for filtered indexes do we even care about fragmentation and should just explicitly update statistics at X interval?

SQL Server updates stats on objects when ~20% of the data changes, but filtered indexes & stats are a special case. They're also updated when 20% of the data changes - but it's 20% of the base table, not 20% of the filtered subset. Because of that, you probably want to manually update stats on them periodically. I love Ola Hallengren's maintenance scripts for this - the index maintenance stored proc has a parameter for updating statistics, and another parameter for choosing what level of sampling you want, and another parameter for choosing whether to update stats on all objects or only the ones with changed rows. It's fantastic.