Sql-server – are overlapping multi column statistics redundant

optimizationsql serversql-server-2005statistics

I have run a database engine tuning advisor session, DTA, to optimize my database.

The DTA has produced some statistics recommendations, however, I noticed some are contained in others for example on my table ProductIcon below:

    CREATE STATISTICS [_dta_stat_172123954_1_2_3_4_5_12] 
ON [dbo].[ProductIcon]
([ClassID], [SegmentID], [GroupID], [Tier1], [LanguageID], [Active])
    go

    CREATE STATISTICS [_dta_stat_172123954_11_1_2_3_4_5_12] 
ON [dbo].[ProductIcon]
([IconTypeID], [ClassID], [SegmentID], [GroupID], [Tier1], [LanguageID], [Active])
    go

this is the table definition:

    IF OBJECT_ID('[dbo].[ProductIcon]') IS NOT NULL 
    DROP TABLE [dbo].[ProductIcon] 
    GO
    CREATE TABLE [dbo].[ProductIcon] ( 
    [ClassID]      INT                              NOT NULL,
    [SegmentID]    INT                              NOT NULL,
    [GroupID]      INT                              NOT NULL,
    [Tier1]        VARCHAR(10)                      NOT NULL,
    [LanguageID]   SMALLINT                         NOT NULL  CONSTRAINT [DF_ProductIcon_LanguageID] DEFAULT ((1)),
    [Image]        VARCHAR(50)                      NOT NULL  CONSTRAINT [DF_ProductIcon_Image] DEFAULT (''),
    [Descr]        VARCHAR(100)                     NOT NULL  CONSTRAINT [DF_ProductIcon_Descr] DEFAULT (''),
    [SortOrder]    INT                              NOT NULL  CONSTRAINT [DF__ProductIc__SortO__6A358A53] DEFAULT ((1)),
    [URL]          NVARCHAR(512)                        NULL,
    [LinkJSClick]  NVARCHAR(512)                        NULL,
    [IconTypeID]   INT                                  NULL,
    [Active]       BIT                              NOT NULL  CONSTRAINT [DF_ProductIcon_Active] DEFAULT ((1)),
    [Tier2]        VARCHAR(10)                      NOT NULL  CONSTRAINT [DF_ProductIcon_Tier2] DEFAULT (''),
    CONSTRAINT   [PK_ProductIcon]  
PRIMARY KEY CLUSTERED  
([ClassID] asc, [SegmentID] asc, [GroupID] asc, 
 [Tier1] asc, [LanguageID] asc, [SortOrder] asc, [Tier2] asc))

Question:

If I knew exactly how the stats would be used I would probably be able to decide whether or not I should keep both or one of them.

Is there any way I can have this vision so that I can make my decision as to which one(s) of them I should keep?

we are still on sql 2005 for this one.

This table currently does not have any other index , however the following index is appointed as missing:

CREATE NONCLUSTERED INDEX IDX_IconTypeID_Active_INC_1_2_3_4_5_6 
ON [dbo].[ProductIcon] ([IconTypeID], [Active]) 
INCLUDE (ClassID,SegmentID,GroupID,Tier1,Descr,Tier2) 

Best Answer

They're not redundant, no.

Each statistics object includes a histogram of the values stored in the first listed column. For the remaining columns, only density information is kept at each level (no histogram).

For example, the proposed statistic on (ClassID, SegmentID, GroupID, Tier1, LanguageID, Active) would create:

  • A histogram for ClassID values
  • A density number for:
    • ClassID, SegmentID
    • ClassID, SegmentID, GroupID
    • ClassID, SegmentID, GroupID, Tier1
    • ClassID, SegmentID, GroupID, Tier1, LanguageID
    • ClassID, SegmentID, GroupID, Tier1, LanguageID, Active

These densities can be useful when there is correlation between columns, but only for equality predicates (and you will need SQL Server 2016 to get this benefit when using the 'new' cardinality estimator. The original cardinality estimator used this density information by default).

You can see the histogram and density information using DBCC SHOW_STATISTICS.

Note the order of columns is important. This statistic could provide average density information for equality predicates on (ClassID, SegmentID, and GroupID) but not (ClassID, GroupID).

Statistics are quite lightweight as far as storage is concerned, but you need to be aware of the potential cost of maintaining them, especially on large tables where a high sampling rate is necessary to achieve acceptable quality.

If DTA recommends a statistic, that is a good indication that the statistic would be useful for one or more queries in your workload. I don't recall if DTA reports on the queries that contributed to a recommendation, but if not, for the example above, you would be looking for queries with equality predicates on all the columns listed. Compare the quality of the plan and cardinality estimates with and without the new statistic if you really want to dive into the detail.

The missing index would create an associated statistic on (IconTypeID, Active), with a histogram on IconTypeID, and average density information for the combination (IconTypeID, Active).

See the following for more information: