SQL Server – How to List Missing Multicolumn Statistics

optimizationsql serversql server 2014statistics

I have used the database engine tuning advisor, DTA, in order to improve the performance of one of my databases.

It produced a list of recommendations, mostly multicolumn statistics.

These recommendations need then to be cross checked and tested, before applying them in production.

Statistics are used by the query optimiser (whitepaper on MSDN) to track distribution of values in indexes and/or columns.

Inspired by the following reading:

When is it better to create STATISTICS instead of creating an Index?

MY QUESTION IS

Other than the DTA, is there any other way to find out the missing multicolumn statistics?

Best Answer

Other than the DTA, is there any other way to find out the missing multicolumn statistics?

Nope ! query optimizer assumes that columns within the same table are independent. Joe Sack talks about it here

The query optimizer assumes that columns within the same table are independent. For example, if you have a city and state column, we may intuitively know that these two columns are correlated, but SQL Server does not understand this unless we help it out with an associated multi-column index, or with manually-created multi-column statistics. Without helping the optimizer with correlation, the selectivity of your predicates may be exaggerated.

and @paulwhite talks about it here.

A certain amount of trial-and-error may be necessary to identify the multi-column statistic.

Related Solutions

Sql-server – Non-Clustered Indexes – keys and nonkeys

You are asking the query optimizer to produce a plan that can answer the query:

SELECT *
FROM tbl_pics
WHERE deleted = 0 AND map_id = 1 AND hidden = 0;

The rest is fluff (including the OR pic_id = @pic_id). This will be a table scan, guaranteed, because of the low selectivity of the predicates involved (I'm sure deleted and hidden are 0/1, and map_ip I doubt it has any significant impact). The only predicate that could save the query is pic_id = @pic_id but by placing it in an OR condition you killed it's chance. No secondary index can help it, realistically. The addition of ROW_NUMBER will ad a sort, most likely, but the real damage is the scan.

This is a lost cause. Come up with realistic requirements.

SQL Server – Are Multicolumn Histograms Possible?

Are multicolumn histograms possible?

Not true multi-dimensional histograms, no.

Is this artificial and inelegant solution the only available option to achieve accurate estimations when dealing with filtering by two or more not independent columns?

SQL Server does support "multi-column" statistics, but they only capture average density (correlation) information in addition to a histogram on the first named column. They are only useful for equality comparisons.

The average density information does not capture any detail, so you will get the same selectivity for any pair of values on a two-column statistics object. In some cases, multi-column statistics can be good enough, and better than nothing. Multi-column statistics are automatically built on multi-column indexes.

Depending on SQL Server version, you may also be able to make use of filtered indexes and filtered statistics:

-- Filtered statistics example
CREATE STATISTICS stats_StatsTest_col2_col1_eq_1
ON dbo.StatsTest (col2)
WHERE col1 = 1;

CREATE STATISTICS stats_StatsTest_col2_col1_eq_2
ON dbo.StatsTest (col2)
WHERE col1 = 2;

Or you could build an indexed view (which can support indexes and statistics of its own). Indexed views are the mechanism behind the DATE_CORRELATION_OPTIMIZATION database setting, a little-used feature for inter-table correlations, but which applies to the spirit of the question.

Is the computed column and the filter by the computed column strictly necessary for obtaining actual precision?

It is not the only method. In addition to the things already mentioned, you could also specify the exact textual definition of the computed column and the optimizer will generally match it to statistics on the computed column.

There are also trace flags that change the assumptions made about multi-column correlations. Also, the default correlation assumption in SQL Server 2014 (with the new cardinality estimator enabled) is changed from Independence to Exponential Backoff (more details here and here). Ultimately, this is just a different assumption though. It will be better in many cases, and worse in others.

Exact precision in cardinality estimation is not always needed to obtain a good execution plan. There is always a trade-off between generating a plan that can be reused for different parameter values, and a plan that is optimal for a specific execution, but not reused.

Best Answer

Related Solutions

Sql-server – Non-Clustered Indexes – keys and nonkeys

SQL Server – Are Multicolumn Histograms Possible?

Related Question