Sql-server – Workaround for the limit of the number of missing indexes suggested by SQL Server’s DMVs

dmvindexperformancequery-performancesql server

In production, each instance of SQL Server has over 250 databases (what we call 'OrgDb's). The project I'm currently working on, aims to send all the reported missing indexes by SQL Server's DMVs to telemetry, in order to do some post analysis on how well the queries sent to these orgDbs are doing and possibly do some optimizations.

Sounds straightforward, right? The problem however, is that there is a max limit of 500 for the number of missing indexes that DMVs can report in a single SQL Server whereas we expect around 20 missing indexes per OrgDb (~ 5000 in total).

Can anyone think of a workaround for this limit? One solution I initially thought of, was to delete the DMVs' missing index tables:

sys.dm_db_missing_index_details
sys.dm_db_missing_index_groups
sys.dm_db_missing_index_group

after each update, but turns out those tables cannot be modified:

Error:Ad hoc updates to system catalogs are not allowed.

Best Answer

You can't reset the DMVs, however you can work around this limitation and remove rows from the DMVs by creating a small filtered index on the tables mentioned in the DMVs then immediately dropping that index.

For instance:

CREATE INDEX IX_temp
ON dbo.SomeTable(SomeKey)
WHERE SomeKey IS NULL;

DROP INDEX dbo.SomeTable.IX_temp;

I've created a script to automate this process.

IF OBJECT_ID('dbo.RemoveMissingIndexSuggestions') IS NOT NULL
DROP PROCEDURE RemoveMissingIndexSuggestions;
GO
CREATE PROCEDURE dbo.RemoveMissingIndexSuggestions
(
    @Database SYSNAME = NULL --optional, if NULL, clear all suggestions
                             --if specified, only clear suggestions for that database
    , @Table SYSNAME = NULL --if not NULL, only clear suggestions for the specified table 
)
AS
BEGIN
    /*
        Max Vernon, 2016-04-08
        Inspired by work by Joe Sack and Glenn Berry at
        http://www.sqlskills.com/blogs/joe/clearing-missing-index-suggestions-for-a-single-table/

        Creates one index for each table that is mentioned in sys.dm_db_missing_index_details
        then promply drops that index.  The index is created with a WHERE clause that is likely 
        to eliminate all or almost all rows, and therefore will be created quite quickly.
    */
    SET NOCOUNT ON;
    DECLARE @ObjectName SYSNAME;
    DECLARE @DatabaseName SYSNAME;
    DECLARE @CreateStmt NVARCHAR(MAX);
    DECLARE @DropStmt NVARCHAR(MAX);
    DECLARE @stmt NVARCHAR(MAX);
    DECLARE @msg NVARCHAR(2000);
    DECLARE @vars NVARCHAR(1000);
    DECLARE @Uniquifier NVARCHAR(48);
    SET @vars = '@stmt NVARCHAR(MAX)';
    DECLARE cur CURSOR LOCAL FORWARD_ONLY STATIC
    FOR
    WITH cte AS
    (
        SELECT ObjectName = d.name + '.' + s.name + '.' + o.name
            , DatabaseName = d.name
            , CreateStmt = N'CREATE INDEX [IX_temp] 
ON ' + QUOTENAME(s.name) + N'.' + QUOTENAME(o.name) + N'(' + mid.equality_columns + N') 
WHERE ' 
              + (
                SELECT TOP(1) cols.ColName FROM (
                    SELECT TOP(1) ColName = QUOTENAME(c.name) + N' IS NULL'
                    FROM sys.columns c 
                        INNER JOIN sys.key_constraints kc ON c.object_id = kc.parent_object_id 
                    WHERE c.object_id = o.object_id 
                        AND kc.type_desc = N'PRIMARY_KEY_CONSTRAINT'
                    UNION ALL
                    SELECT TOP(1) QUOTENAME(c.name) + N' = -2147483648'
                    FROM sys.columns c
                        INNER JOIN sys.types ty ON c.system_type_id = ty.system_type_id
                    WHERE ty.name IN 
                        (
                              N'bigint'
                            , N'binary'
                            , N'hierarchyid'
                            , N'int'
                            , N'uniqueidentifier'
                            , N'varbinary'
                        )
                    ) cols
                ) 
                + ';'
                , DropStmt = N'DROP INDEX ' + QUOTENAME(s.name) + N'.' + QUOTENAME(o.name) + '.[IX_temp];'
                , rn = ROW_NUMBER() OVER (PARTITION BY mid.object_id ORDER BY mid.index_handle)
        FROM sys.dm_db_missing_index_details mid
            INNER JOIN sys.objects o ON mid.object_id = o.object_id
            INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
            INNER JOIN sys.databases d ON mid.database_id = d.database_id
        WHERE o.name NOT LIKE '#%' -- ignore temp tables
            AND (d.name = @Database OR @Database IS NULL)
            AND (o.name = @Table OR @Table IS NULL)
    )
    SELECT cte.ObjectName
        , cte.DatabaseName
        , cte.CreateStmt
        , cte.DropStmt
    FROM cte
    WHERE rn = 1;
    OPEN cur;
    FETCH NEXT FROM cur INTO @ObjectName, @DatabaseName, @CreateStmt, @DropStmt;
    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @msg = 'Flushing ' + @ObjectName + ' indexes.

';
        RAISERROR (@msg, 0, 1) WITH NOWAIT;
        SET @stmt = 'EXEC ' + QUOTENAME(@DatabaseName) + '.sys.sp_executesql @stmt;'
        SET @Uniquifier = CONVERT(NVARCHAR(48), NEWID(), 0);
        SET @CreateStmt = REPLACE(@CreateStmt, '[IX_Temp]', '[IX_Temp_' + @Uniquifier + ']')
        SET @DropStmt = REPLACE(@DropStmt, '[IX_Temp]', '[IX_Temp_' + @Uniquifier + ']')
        SET @CreateStmt = 'SET ANSI_NULLS ON;
SET ANSI_PADDING ON;
SET ANSI_WARNINGS ON;
SET ARITHABORT ON;
SET CONCAT_NULL_YIELDS_NULL ON;
SET NUMERIC_ROUNDABORT OFF;
SET QUOTED_IDENTIFIER ON;
' + @CreateStmt + '

';
        SET @DropStmt = 'SET ANSI_NULLS ON;
SET ANSI_PADDING ON;
SET ANSI_WARNINGS ON;
SET ARITHABORT ON;
SET CONCAT_NULL_YIELDS_NULL ON;
SET NUMERIC_ROUNDABORT OFF;
SET QUOTED_IDENTIFIER ON;
' + @DropStmt + '

';
        RAISERROR (@CreateStmt, 0, 1) WITH NOWAIT;
        RAISERROR (@DropStmt, 0, 1) WITH NOWAIT;
        EXEC sp_executesql @stmt, @vars, @stmt = @CreateStmt;
        EXEC sp_executesql @stmt, @vars, @stmt = @DropStmt;
        FETCH NEXT FROM cur INTO @ObjectName, @DatabaseName, @CreateStmt, @DropStmt;
    END
    CLOSE cur;
    DEALLOCATE cur;
END
GO

This can be executed with the @Database parameter set to the name of a database to eliminate recommended indexes only related to that database, or without parameters to eliminate all recommended indexes. One can optionally limit this to the suggestions for a single table by passing the name of the table into the @Table parameter.

EXEC dbo.RemoveMissingIndexSuggestions @Database = 'tempdb', @Table = 'SomeTable';

It creates at most one index per table. The index has a unique name and is constructed using a single column, giving priority to using the table's PRIMARY KEY, if it has one. Tables will be missed by this proc if the table does not have a primary key or at least one of the following types of columns:

bigint
binary
hierarchyid
int
uniqueidentifier
varbinary

I've written a short blog post on this issue over at SQLServerScience

Related Solutions

Sql-server – Ok to drop indexes on FK’s if they have no stats in the DMVS

As long as you are sure the server was up for that entire time, and that nobody cleared out DMV stats inadvertently. This can happen if the database is detached + re-attached / restored / offline + online / auto-close + online, or if the index has been explicitly dropped / re-created (the DMV is not affected by disable / rebuild / reorganize, except in the case of SQL Server 2012, where rebuild currently clears the stats - I suspect that will be fixed - thanks @MartinSmith).

You should expect that if a DML activity used the index in a read capacity for some reason (e.g. to perform an update on some other table), this would register as a read activity, not a write. All of the writes you see are index maintenance.

As an aside, here is a slightly more efficient way to write this query. I removed the twice-referenced correlated subquery and dropped the unnecessary joins to sys.objects and sys.schemas. I also fixed a few minor syntax things, such as prefixing all columns with aliases where appropriate, placing reserved words in square brackets, and removing AS 'column alias' syntax. I found it confusing that half way through the SELECT list you changed from alias = expression syntax to expression AS alias syntax - you should pick one and be consistent IMHO. I changed these both for majority within the query and also according to my personal preference. :-)

SELECT name = OBJECT_NAME(s.[object_id])
    , indexname = i.name
    , i.index_id   
    , reads = s.user_seeks + s.user_scans + s.user_lookups   
    , writes = s.user_updates   
    , g.[rows]
    , reads_per_write = CASE
        WHEN s.user_updates < 1 THEN 100.0
        ELSE 1.00 * (s.user_seeks + s.user_scans + s.user_lookups) / s.user_updates
      END
    , [drop statement] = 'DROP INDEX ' + QUOTENAME(i.name) 
      + ' ON ' + QUOTENAME(OBJECT_SCHEMA_NAME(s.[object_id])) 
      + '.'    + QUOTENAME(OBJECT_NAME(s.[object_id]))
    FROM sys.dm_db_index_usage_stats AS s  
    INNER JOIN sys.indexes AS i 
      ON i.index_id = s.index_id 
      AND s.object_id = i.object_id   
    INNER JOIN 
    (
      SELECT [object_id], index_id, [rows] = SUM([rows]) 
        FROM sys.partitions GROUP BY [object_id], index_id
        HAVING SUM([rows]) > 2000
    ) AS g
      ON i.[object_id] = g.[object_id] 
      AND i.index_id = g.index_id
    WHERE OBJECTPROPERTY(s.[object_id], 'IsUserTable') = 1
    AND s.database_id = DB_ID()   
    AND i.type_desc = 'nonclustered'
    AND i.is_primary_key = 0
    AND i.is_unique_constraint = 0
    ORDER BY name, reads;

Mysql – How to optimize a very slow query with joins and group by

OK, I ended up adding another lookup table:

CREATE TABLE IF NOT EXISTS `stops_routes` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `stop_id` varchar(100) NOT NULL,
  `route_id` varchar(100) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `stop_route` (`stop_id`,`route_id`),
  KEY `stop_id` (`stop_id`),
  KEY `route_id` (`route_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

Filling it was fairly fast:

mysql> insert into stops_routes (stop_id, route_id)
    ->
    -> select
    ->     s.stop_id,
    ->     r.route_id as from_route_id
    ->
    -> from routes r
    -> left join trips t on t.route_id = r.route_id
    -> left join stop_times st on st.trip_id = t.trip_id
    -> left join stops s on s.stop_id = st.stop_id
    -> group by s.stop_id, r.route_id;
Query OK, 3496 rows affected (8.38 sec)
Records: 3496  Duplicates: 0  Warnings: 0

Using it is blazingly fast:

mysql> select
    ->     r.route_id as from_route_id,
    ->     c_sr.route_id as to_route_id
    ->
    -> from routes r
    ->
    -> left join stops_routes sr on sr.route_id = r.route_id
    -> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
    -> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
    ->
    -> where r.route_id <> c_sr.route_id
    -> group by r.route_id, c_sr.route_id
    -> limit 10;
+---------------+-------------+
| from_route_id | to_route_id |
+---------------+-------------+
| 0001          | 0002        |
| 0001          | 0003        |
| 0001          | 0004        |
| 0001          | 0005        |
| 0001          | 0006        |
| 0001          | 0008        |
| 0001          | 0009        |
| 0001          | 0011        |
| 0001          | 0014        |
| 0001          | 0031        |
+---------------+-------------+
10 rows in set (0.63 sec)

Now I can fill my last lookup table (set of connections between every routes on my GTFS network):

mysql> insert into route_connections (from_route_id, to_route_id)
    -> select
    ->     r.route_id as from_route_id,
    ->     c_sr.route_id as to_route_id
    ->
    -> from routes r
    ->
    -> left join stops_routes sr on sr.route_id = r.route_id
    -> left join stop_connections c_s on c_s.from_stop_id = sr.stop_id
    -> left join stops_routes c_sr on c_sr.stop_id = c_s.to_stop_id
    ->
    -> where r.route_id <> c_sr.route_id
    -> group by r.route_id, c_sr.route_id;
Query OK, 2848 rows affected (0.31 sec)
Records: 2848  Duplicates: 0  Warnings: 0

Amazingly fast. I guess the engine couldn't break up the steps to optimize this.
I'd still be interested to know if it would be possible to get the same result (from route to route connections table) using only one sub-second or sub-minute query.

Best Answer

Related Solutions

Sql-server – Ok to drop indexes on FK’s if they have no stats in the DMVS

Mysql – How to optimize a very slow query with joins and group by

Related Question