Sql-server – Does SQL Server use (unused)index stats for optimizer

index-statisticsoptimizationsql server

I am wondering about unused index in MS SQL Server. By the Index usage DMV I can identify an index which has not been used for seeks, scans or lookups.

However I know from Oracle that an index might not be used in such a way in a execution plan, however it can still contribute statistics/cardinality information to the (Oracle) optimizer. This contribution is not monitored in the same way.

So I am wondering if in MSSQL a Index can have a similar positive effect even when it is not directly used (in a representative time frame)? And specifically, if it can be better than a column statistic (I.e. dropping the index would be harmful).

I haven’t seen this mentioned in any of the index tuning articles I have come along, so I assume MSSQL (up to 2017) does not have this concept, is that correct?

Best Answer

Yes, statistics based on indexes can be used to help with query plan creation even if the underlying index isn't used to access data in the plan. Consider that the query optimizer may consider many different query plans and data access paths while creating a query plan. The compiled query plan may end up not using one of the indexes that was considered. That certainly doesn't mean that any query plan that benefited from the statistics of that index needs to be invalidated, right?

An example might be helpful as well. First I'll throw about 6.5 million rows into a heap:

DROP TABLE IF EXISTS dbo.A_GOOD_HEAP;

CREATE TABLE dbo.A_GOOD_HEAP (
    INDEXED_COLUMN BIGINT NULL,
    OTHER_COLUMN BIGINT NULL
);

INSERT INTO dbo.A_GOOD_HEAP WITH (TABLOCK)
SELECT CASE WHEN t.RN % 10 = 0 THEN 0 ELSE 1 END
, RN
FROM
(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) t;

Next I'll create an index on one of the columns and look at the histogram for the statistics object that is automatically created.

CREATE INDEX IX ON dbo.A_GOOD_HEAP (INDEXED_COLUMN);

DBCC SHOW_STATISTICS ('A_GOOD_HEAP', 'IX');

Here's the histogram:

╔══════════════╦════════════╦═════════╦═════════════════════╦════════════════╗
║ RANGE_HI_KEY ║ RANGE_ROWS ║ EQ_ROWS ║ DISTINCT_RANGE_ROWS ║ AVG_RANGE_ROWS ║
╠══════════════╬════════════╬═════════╬═════════════════════╬════════════════╣
║            0 ║          0 ║  645160 ║                   0 ║              1 ║
║            1 ║          0 ║ 5806440 ║                   0 ║              1 ║
╚══════════════╩════════════╩═════════╩═════════════════════╩════════════════╝

Based on the statistics there are 5806440 rows in the table with a value of 1 for INDEXED_COLUMN. Now consider this query:

SELECT COUNT(DISTINCT OTHER_COLUMN)
FROM dbo.A_GOOD_HEAP
WHERE INDEXED_COLUMN = 1;

The query optimizer has a few different access paths for the data. It also has a few choices for how to calculate the aggregate. One of the considerations for the picking an algorithm for the agggregate is the cardinality estimate of the data. Here's a screenshot of the query plan:

enter image description here

Note that the estimate matches the histogram exactly even though the index isn't used to access data. Newer versions of SQL Server show which statistics were considered during optimization in the query plan. You can see that the statistic associated with the index was used:

enter image description here

However, the sys.dm_db_index_usage_stats dmv doesn't report any end user activity on the index.