Sql-server – a good guideline for sampling for statistics updates

index-statisticsindex-tuningsql serversql-server-2012

Scope: A billion record table that is indexed in 10 ways. Rate of change per day is 1/2 a percent.

Some of the indexes are monotonously increasing (datetime/timestamp), yet the common queries most likely hit the tail end. I assume statistics need updating frequently for this type, otherwise the recent data becomes unrepresented in the index?

Other indexes are more randomly distributed, e.g. (Customer Key, Datetime). These can do with less frequent updates, since the statistics are quite representative of the whole. We can let the index change enough to force auto statistics updates on these, correct?

For both types, is there any benefit from increasing the sampling from 10% to 100%, if the data to be sampled is random and quite representative of the whole?

Looking for best practice with TB data.

Best Answer

If you always query the tail, you can give a shot to filtered indexes. Index 10 ways with, say, WHERE datetimecolumn > @INDEXBUILDDAY-30. Periodically (daily, weekly?) you can move this filter forward by adding (online) a new set of filterd indexes and then dropping the old ones. With a much smaller size and a much more recent build (implying recent stats), they would be 'fresh' and very appetizing for the optimizer. The periodic rebuild doesn't even need to scan the base table (no huge IO, no BP cache pollution), the old filtered index should always be usable as a source to build the next filtered index (it covers it).

For those queries that land outside the covered filtered tail, you can keep some (less than the 10 ways hopefully) full, unfiltered indexes. With the right voodoo invocation the optimizer might get your point and use the filtered ones when appropriate and only resort to the full ones when necessary...