Sql-server – Does SQL Server use (unused)index stats for optimizer

index-statisticsoptimizationsql server

I am wondering about unused index in MS SQL Server. By the Index usage DMV I can identify an index which has not been used for seeks, scans or lookups.

However I know from Oracle that an index might not be used in such a way in a execution plan, however it can still contribute statistics/cardinality information to the (Oracle) optimizer. This contribution is not monitored in the same way.

So I am wondering if in MSSQL a Index can have a similar positive effect even when it is not directly used (in a representative time frame)? And specifically, if it can be better than a column statistic (I.e. dropping the index would be harmful).

I haven’t seen this mentioned in any of the index tuning articles I have come along, so I assume MSSQL (up to 2017) does not have this concept, is that correct?

Best Answer

Yes, statistics based on indexes can be used to help with query plan creation even if the underlying index isn't used to access data in the plan. Consider that the query optimizer may consider many different query plans and data access paths while creating a query plan. The compiled query plan may end up not using one of the indexes that was considered. That certainly doesn't mean that any query plan that benefited from the statistics of that index needs to be invalidated, right?

An example might be helpful as well. First I'll throw about 6.5 million rows into a heap:

DROP TABLE IF EXISTS dbo.A_GOOD_HEAP;

CREATE TABLE dbo.A_GOOD_HEAP (
    INDEXED_COLUMN BIGINT NULL,
    OTHER_COLUMN BIGINT NULL
);

INSERT INTO dbo.A_GOOD_HEAP WITH (TABLOCK)
SELECT CASE WHEN t.RN % 10 = 0 THEN 0 ELSE 1 END
, RN
FROM
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) t;

Next I'll create an index on one of the columns and look at the histogram for the statistics object that is automatically created.

CREATE INDEX IX ON dbo.A_GOOD_HEAP (INDEXED_COLUMN);

DBCC SHOW_STATISTICS ('A_GOOD_HEAP', 'IX');

Here's the histogram:

╔══════════════╦════════════╦═════════╦═════════════════════╦════════════════╗
║ RANGE_HI_KEY ║ RANGE_ROWS ║ EQ_ROWS ║ DISTINCT_RANGE_ROWS ║ AVG_RANGE_ROWS ║
╠══════════════╬════════════╬═════════╬═════════════════════╬════════════════╣
║            0 ║          0 ║  645160 ║                   0 ║              1 ║
║            1 ║          0 ║ 5806440 ║                   0 ║              1 ║
╚══════════════╩════════════╩═════════╩═════════════════════╩════════════════╝

Based on the statistics there are 5806440 rows in the table with a value of 1 for INDEXED_COLUMN. Now consider this query:

SELECT COUNT(DISTINCT OTHER_COLUMN)
FROM dbo.A_GOOD_HEAP
WHERE INDEXED_COLUMN = 1;

The query optimizer has a few different access paths for the data. It also has a few choices for how to calculate the aggregate. One of the considerations for the picking an algorithm for the agggregate is the cardinality estimate of the data. Here's a screenshot of the query plan:

Note that the estimate matches the histogram exactly even though the index isn't used to access data. Newer versions of SQL Server show which statistics were considered during optimization in the query plan. You can see that the statistic associated with the index was used:

However, the sys.dm_db_index_usage_stats dmv doesn't report any end user activity on the index.

Related Solutions

Mysql – From where does the MySQL Query Optimizer read index statistics

The direct answer for this would be

information_schema.statistics

mysql> desc information_schema.statistics;
+---------------+---------------+------+-----+---------+-------+
| Field         | Type          | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+-------+
| TABLE_CATALOG | varchar(512)  | NO   |     |         |       |
| TABLE_SCHEMA  | varchar(64)   | NO   |     |         |       |
| TABLE_NAME    | varchar(64)   | NO   |     |         |       |
| NON_UNIQUE    | bigint(1)     | NO   |     | 0       |       |
| INDEX_SCHEMA  | varchar(64)   | NO   |     |         |       |
| INDEX_NAME    | varchar(64)   | NO   |     |         |       |
| SEQ_IN_INDEX  | bigint(2)     | NO   |     | 0       |       |
| COLUMN_NAME   | varchar(64)   | NO   |     |         |       |
| COLLATION     | varchar(1)    | YES  |     | NULL    |       |
| CARDINALITY   | bigint(21)    | YES  |     | NULL    |       |
| SUB_PART      | bigint(3)     | YES  |     | NULL    |       |
| PACKED        | varchar(10)   | YES  |     | NULL    |       |
| NULLABLE      | varchar(3)    | NO   |     |         |       |
| INDEX_TYPE    | varchar(16)   | NO   |     |         |       |
| COMMENT       | varchar(16)   | YES  |     | NULL    |       |
| INDEX_COMMENT | varchar(1024) | NO   |     |         |       |
+---------------+---------------+------+-----+---------+-------+
16 rows in set (0.01 sec)

You could SELECT from that table with

SELECT * FROM information_schema.statistics
WHERE table_schema='mydb' AND table_name='mytable';

or see the statistics by doing

SHOW INDEXES FROM mydb.mytable;

Please keep in mind that this table is not always accurate in a write-heavy environment. Periodically you will have to run ANALYZE TABLE against all MyISAM tables that are updated frequently. Otherwise, the MySQL Query Optimizer, which relies on information_schema.statistics, can sometimes make bad choices when developing EXPLAIN plans for queries. Index statistics must be as up-to-date as possible.

ANALYZE TABLE has ABSOLUTELY NO EFFECT against InnoDB tables. All index statistics for InnoDB are computed on demand by means of dives into the BTREE pages. Therefore, when you run SHOW INDEXES FROM against an InnoDB table, the cardinalities displayed are always approximations.

UPDATE 2011-06-21 12:17 EDT

For clarification of ANALYZE TABLE, let me rephrase. Running ANALYZE TABLE on InnoDB tables is completely useless. Even if you ran ANALYZE TABLE on an InnoDB table, the InnoDB storage engine performs dives into the index for cardinality approximations over and over again, thus trashing the statistics you just compiled. In fact, Percona performed some tests on ANALYZE TABLE and came to that conclusion as well.

Sql-server – SQL Server 2008 R2 DMV Question

use dbx;
select foo
from db1.dbo.table
join db2.dbo.table on condition
where some_function();

This query consumed lots of CPU and requested a large memory grant. In which database? The information you want simply doesn't exist as a concept. As a human with insight knowledge and with hindsight benefit, you probably would be able to explain why 75% of CPU is due to db1 and 15% is due to db2. But ultimately you just can assign queries to a database. The fact that some (ok, most) queries are 100% contained inside a db does not mean that all query resources can be assigned deterministically to a db.

However, for practical means is relatively simple to automate exactly what you did in your post: inspect the plans and identify all physical access operators locations and use this info to assign the query resources to a DB.

with xmlnamespaces (default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select x.value(N'@NodeId',N'int') as NodeId
    , x.value(N'@PhysicalOp', N'sysname') as PhysicalOp
    , x.value(N'@LogicalOp', N'sysname') as LogicalOp
    , ox.value(N'@Database',N'sysname') as [Database]
    , ox.value(N'@Schema',N'sysname') as [Schema]
    , ox.value(N'@Table',N'sysname') as [Table]
    , ox.value(N'@Index',N'sysname') as [Index]
    , ox.value(N'@IndexKind',N'sysname') as [IndexKind]
    , x.value(N'@EstimateRows', N'float') as EstimateRows
    , x.value(N'@EstimateIO', N'float') as EstimateIO
    , x.value(N'@EstimateCPU', N'float') as EstimateCPU
    , x.value(N'@AvgRowSize', N'float') as AvgRowSize
    , x.value(N'@TableCardinality', N'float') as TableCardinality
    , x.value(N'@EstimatedTotalSubtreeCost', N'float') as EstimatedTotalSubtreeCost
    , x.value(N'@Parallel', N'tinyint') as DOP
    , x.value(N'@EstimateRebinds', N'float') as EstimateRebinds
    , x.value(N'@EstimateRewinds', N'float') as EstimateRewinds
    , st.*
    , pl.query_plan
from sys.dm_exec_query_stats as st
cross apply sys.dm_exec_query_plan (st.plan_handle) as pl
cross apply pl.query_plan.nodes('//RelOp[./*/Object/@Database]') as op(x)
cross apply op.x.nodes('./*/Object') as ob(ox)

Best Answer

Related Solutions

Mysql – From where does the MySQL Query Optimizer read index statistics

Sql-server – SQL Server 2008 R2 DMV Question

Related Question