SQL Server Statistics – How Does Sampling Work When Updating Statistics?

sql serverstatistics

I have several massive tables. I'd like to ensure their statistics are up to date via a weekly maintenance plan.

However, doing so is taking too much time.

If I specify

WITH SAMPLE 50 PERCENT

does SQL Server then sample:

  1. the first 50% of pages
  2. every other page
  3. or some other strategy?

BOL is not clear on this.

Best Answer

For WITH SAMPLE 50 PERCENT it works as though for each data page in the table SQL Server flips a coin. If it lands heads then it reads all the rows on the page. If it lands tails then it reads none.

Tracing the UPDATE STATISTICS T WITH SAMPLE 50 PERCENT call in Profiler shows the following query is emitted

SELECT StatMan([SC0], [SB0000])
FROM   (SELECT TOP 100 PERCENT [SC0],
                               step_direction([SC0]) OVER (ORDER BY NULL) AS [SB0000]
        FROM   (SELECT [C] AS [SC0]
                FROM   [dbo].[T] TABLESAMPLE SYSTEM (5.000000e+001 PERCENT) 
                WITH (READUNCOMMITTED)) AS _MS_UPDSTATS_TBL_HELPER
        ORDER  BY [SC0],
                  [SB0000]) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 1) 

With Plan

enter image description here

TABLESAMPLE SYSTEM (5.000000e+001 PERCENT) is responsible for the sampling and is documented here

TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is either included in the sample or excluded. Each page that is included returns all rows in the sample result set.

The documentation also states

Although the plan shows a table scan is performed, only those pages that are included in the result set are actually required to be read from the data file.

The STATMAN call is to an internal aggregate function briefly described here