SQL Server Data Warehouse – Should Auto Update Statistics Be Disabled?

data-warehousesql serverstatistics

I have 200 GB data warehouse in SQL Server.

I have been experiencing really slow execution times for some queries; for example 12 hours for a simple delete query with an inner join.

After doing some research with the execution plans, I've updated the statistics of the 2 tables involved in the query, using the WITH FULLSCAN option.

The query now executes in less than a second, so it appears the statistics were not up to date.

I'm considering disabling auto update statistics on the database and running UPDATE STATISTICS manually after the data warehouse is loaded. The data warehouse is loaded incrementally from a source ERP system daily, at night.

Am I correct in assuming auto update statistics in a data warehousing scenarios is not really useful? Instead, does it make more sense to update the statistics manually after the data is loaded?

Best Answer

Here is a whitepaper on when auto_update of stats occurs. Here are the salient points vis-a-vis automatic updates to statistics:

  • The table size has gone from 0 to >0 rows (test 1).
  • The number of rows in the table when the statistics were gathered was 500 or less, and the colmodctr of the leading column of the statistics object has changed by more than 500 since then (test 2).
  • The table had more than 500 rows when the statistics were gathered, and the colmodctr of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered (test 3).

So @JNK made the point in a comment that if you have 1 billion rows in a table, you'd need to have 20,000,5000 writes to the first column in the statistic to trigger an update.

Let's take the following structure:

CREATE TABLE dbo.test_table (
    test_table_id INTEGER IDENTITY(1,1) NOT NULL, 
    test_table_value VARCHAR(50), 
    test_table_value2 BIGINT, 
    test_table_value3 NUMERIC(10,2)
);

CREATE CLUSTERED INDEX cix_test_table ON dbo.test_table (test_table_id, test_table_value);

Now we can check to see what happened in statistics land.

select * 
    from sys.stats
        where object_id = OBJECT_ID('dbo.test_table')

stat_container

However, to see if this is a meaningful statistic object we need to:

dbcc show_statistics('dbo.test_table',cix_test_table)

histogram

So this statistic hasn't been updated. That's because it looks like the statistic isn't updated until a SELECT occurs and even then the SELECT has to fall outside of what SQL Server has within its histogram. Here's a test script that I ran to test this:

    CREATE TABLE test_table (
        test_table_id INTEGER IDENTITY(1,1) NOT NULL, 
        test_table_value VARCHAR(50), 
        test_table_value2 BIGINT, 
        test_table_value3 NUMERIC(10,2)
    );

    CREATE CLUSTERED INDEX cix_test_table ON test_table (test_table_id, test_table_value);

    ALTER TABLE test_table ADD CONSTRAINT pk_test_table PRIMARY KEY  (test_table_id)

    SELECT * 
        FROM sys.stats
            WHERE object_id = OBJECT_ID('dbo.test_table')

    --DBCC SHOW_STATISTICS('dbo.test_table',pk_test_table)
    DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;

declare @test int = 0

WHILE @test < 1
    BEGIN
        INSERT INTO test_table (test_table_value,test_table_value2,test_table_value3) VALUES
            ('stats test' + CAST(@test AS VARCHAR(10)),@test, @test)
        SET @test = @test + 1;
    END

SELECT 'one row|select < 1', * FROM test_table WHERE test_table_id < 1;
--DBCC SHOW_STATISTICS('dbo.test_table',pk_test_table);
DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;

SET @test = 1

WHILE @test < 500
    BEGIN
        INSERT INTO test_table (test_table_value,test_table_value2,test_table_value3) VALUES
            ('stats test' + CAST(@test AS VARCHAR(10)),@test, @test)
        SET @test = @test + 1;
    END

SELECT '100 rows(add 99)|select < 100',* FROM test_table WHERE test_table_id < 100;
--DBCC SHOW_STATISTICS('dbo.test_table',pk_test_table);
DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;
--get the table up to 500 rows/changes
WHILE @test < 500
    BEGIN
        INSERT INTO test_table (test_table_value,test_table_value2,test_table_value3) VALUES
            ('stats test' + CAST(@test AS VARCHAR(10)),@test, @test)
        SET @test = @test + 1;
    END
SELECT '500 rows(add 400)|select < 100',* FROM test_table WHERE test_table_id < 100;
DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;
SELECT '500 rows(add 400)|select < 500',* FROM test_table WHERE test_table_id < 500;
--DBCC SHOW_STATISTICS('dbo.test_table',pk_test_table);
DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;
--bump it to 501
SET @test = 500;
WHILE @test < 501
    BEGIN
        INSERT INTO test_table (test_table_value,test_table_value2,test_table_value3) VALUES
            ('stats test' + CAST(@test AS VARCHAR(10)),@test, @test)
        SET @test = @test + 1;
    END


SELECT '501 rows(add 1)|select < 501',* FROM test_table WHERE test_table_id < 501;
--DBCC SHOW_STATISTICS('dbo.test_table',pk_test_table);
DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;

--bump it to 600
SET @test = 501;
WHILE @test < 600
    BEGIN
        INSERT INTO test_table (test_table_value,test_table_value2,test_table_value3) VALUES
            ('stats test' + CAST(@test AS VARCHAR(10)),@test, @test)
        SET @test = @test + 1;
    END

SELECT '600 rows (add 100)|select < 600',* FROM test_table WHERE test_table_id < 600;
--DBCC SHOW_STATISTICS('dbo.test_table',pk_test_table);
DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;

--bump it to 700
SET @test = 600;
WHILE @test < 700
    BEGIN
        INSERT INTO test_table (test_table_value,test_table_value2,test_table_value3) VALUES
            ('stats test' + CAST(@test AS VARCHAR(10)),@test, @test)
        SET @test = @test + 1;
    END

SELECT '700 rows (add 100)|select < 700', * FROM test_table WHERE test_table_id < 700;
--DBCC SHOW_STATISTICS('dbo.test_table',pk_test_table);
DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;

--bump it to 1200
SET @test = 700;
WHILE @test < 1200
    BEGIN
        INSERT INTO test_table (test_table_value,test_table_value2,test_table_value3) VALUES
            ('stats test' + CAST(@test AS VARCHAR(10)),@test, @test)
        SET @test = @test + 1;
    END

SELECT '1200 rows (add 500)|select < 1200',* FROM test_table WHERE test_table_id < 1200;
--DBCC SHOW_STATISTICS('dbo.test_table',pk_test_table);
DBCC SHOW_STATISTICS('dbo.test_table',cix_test_table) WITH STAT_HEADER;
--DROP TABLE test_table

Instead of blindly disabling auto_update statistics, I would try to examine your data set for skew. If your data exhibits significant skew, then you need to consider creating filtered statistics and then decide if managing statistics updates manually is the correct course of action.

To analyze for skew you need to run DBCC SHOW_STATISTICS(<stat_object>, <index_name>); (in the above script without the WITH STAT_HEADER) on the particular stat/index combination that you'd like to examine. A quick way to eyeball your skew would be to look at the histogram (third result set) and check out the variance in your EQ_ROWS. If it's fairly consistent then your skew is minimal. To step it up, you look at the RANGE_ROWS column and look at the variance there since this measures how many rows exist between each step. Finally, you can take the [All density] result from the DENSITY_VECTOR (second result set) and multiply that by the [Rows Sampled] value in the STAT_HEADER (first result set) and see what the average expectation would be for a query on that column would be. You compare that average to your EQ_ROWS and if there are many places where it varies significantly, then you've got skew.

If you find that you do have skew, then you need to consider creating some filtered statistics on the ranges that have high very high RANGE_ROWS so that you can give an additional steps for better estimates on those values.

Once you have these filtered statistics in place, then you can look at the possibility of manually updating statistics.