Sql-server – When is it better to create STATISTICS instead of creating an Index

best practicesindex-statisticssql serverstatistics

I have found plenty of information on what STATISTICS are: how they are maintained, how they can be created manually or automatically from queries or indexes, and so on. But, I have been unable to find any guidance or "best practices" information regarding when to create them: what situations benefit more from a manually created STATISTICS object than from an Index. I have seen manually created filtered statistics helping queries on partitioned tables (because the statistics created for the indexes cover the whole table and are not per partition–brillaint!), but surely there must be other scenarios that would benefit from a statistics object while not needing the detail of an index, nor worth the cost of maintaining the index or increasing the chances of blocking / dead-locks.

@JonathanFite, in a comment, mentioned a distinction between indexes and statistics:

Indexes will help SQL find the data faster by creating lookups that are sorted differently than the table itself. Statistics help SQL determine how much memory/effort is going to be required to satisfy the query.

That is great info, mostly because it helps me clarify my question:

How does knowing this (or any other technical info on the whats and hows related to the behaviors and nature of STATISTICS) help determine when to choose CREATE STATISTICS over CREATE INDEX, especially when creating an Index will create the related STATISTICS object? What scenario would be better served by having only the STATISTICS info and not having the Index?

It would be super-duper helpful, if possible, to have a working example of a scenario where the STATISTICS object is a better fit than an INDEX.

Since I am a visual learner / thinker, I thought it might help to see the differences between STATISTICS and INDEXes, side-by-side, as a possible means of helping determine when STATISTICS are the better choice.

Thingy           PROs                             CONs
-------          ----------                       -------------------
INDEX            * Can help sorts.                * Takes up space.
                 * Contains data (can             * Needs to be maintained (extra I/O).
                   "cover" a query).              * More chances for blocking / dead-locks.

STATISTICS       * Takes up very little space.    * Cannot help sorts.
                 * Lighter maintenance / won't    * Cannot "cover" queries.
                   slow down DML operations.
                 * Does not increase chances
                   of blocking / dead-locks.

The following are some resources that I found while looking for this, one that even asks this same question, but it was not answered:

SQL Server Index vs Statistic

SQL Server Statistics Questions We Were Too Shy to Ask

Statistics. Are multicolumn histograms possible?

** To be clear, I do not have an answer for this and am actually looking to get feedback from hopefully a few people to provide what appears to be oddly missing information out here in the interwebs.

Best Answer

You question revolves around - When is it a good thing to just create statistics vs create index (which create stats).

From my sql server internals notes (SQLSkills class- IE1 and IE2) and SQL Server internals book, below is my limited understanding :

SQL Server statistics are nothing but system objects that contain vital information about the index key values and regular column values.

SQL Server uses a cost based model to choose a "good enough" execution plan as fast as possible. Cardanility estimation (estimating no. of rows to be processed on each step of the query execution) is the most important factor in query optimization which inturn affects the join strategy, memory grant requirement, worker thread selection as well as choice of indexes when accessing data.

SQL Server wont use nonclustered indexes when it estimates that a large no. of KEY or RID loopup operations will be required, so it maintains statistics on indexes (and on columns) which will help in such estimations.

There are 2 important things about stats :

Histogram stores info about data distribution for the leftmost statistics (index) column ONLY. It also stores info about multi column density of the key values. So essentially, histogram stores data distribution for the leftmost statistics column only.
SQL Server will retain at most 200 steps in histogram irrespective of table size. The intervals covered by each histogram step increase as the table grows which leads to "less accurate" stats for large tables.

Remember that index selectivity is a metric which is inversely proportional to density i.e the more unique values a column has, the higher its selectivity.

When particular queries do not run very often, you can select to create column-level statistics rather than an index. Column-level statistics help Query Optimizer find better execution plans, even though those execution plans are suboptimal due to the index scans involved. At the same time, statistics do not add an overhead during data modification operations, and they help avoid index maintenance. This approach works only for rarely executed queries.

Refer :

Column Statistics Give the Optimizer an Edge by Kalen Delaney
Statistics, Damned Lies, and Statistics – What is Statman? by Conor Cunningham

Note: Someone like Paul White or Aaron Bertrand can chime in to provide more color to your good question.

Related Solutions

Sql-server – Transactional replication creating a fixup index that’s sleeping, but blocking other processes

If replication is automatically creating indexes, then you probably can't avoid it. The only way to do so might be to create a matching index on the subscriber.

Try running the merge agent from the command line with a more verbose logging level and see what information it returns. It might be bumping against something which isn't being returned as an error.

Sql-server – Query optimizer recommends adding index instead of using existing index

Your index is seemingly fine and good (i.e. covering) for the query and it should be used. The real problem is the query itself and specifically this condition which hides an implicit conversion:

WHERE [serialNumber] = 137802

According to SQL Server's datatype precedence, when two values of different datatypes are compared, the value with the datatype of lower precedence is converted to the datatype of the higher precedence. Unfortunately, int is higher in the list than varchar. This blows up any hope of using the index as the column (serialNumber) values are converted to integers. The column being the 1st position of the index, leads the optimizer to not use that index and search for an alternative (and thus the suggestion.)

Solution is not to have any implicit or explicit conversions of columns in WHERE condition. Simply use:

WHERE [serialNumber] = '137802'

Best Answer

Related Solutions

Sql-server – Transactional replication creating a fixup index that’s sleeping, but blocking other processes

Sql-server – Query optimizer recommends adding index instead of using existing index

Related Question