Sql-server – ny reason for an index to contain the primary key

index-tuningsql server

I inherited a project using SQL Server 2005. Recently there were some performance problems and I started looking at the indexes, and found

_dta_index_survey_25_135059617__K1_K10_K19:
    id ASC, sent_date ASC, group_id ASC

and

_dta_index_survey_21_364632442__K18_K1_K2_K9:
    group_id ASC, id ASC, campaign_id ASC, sent_date ASC

which appears to be generated by some kind of automatic performance tuning tool. I think SQL Server comes with something like that but I'm not familiar with it.

Anyway, id is the primary key, and what I'm trying to understand here is how it could possibly be beneficial to include the primary key within a secondary index… because you have to have the leftmost pieces of information to use an index and if you know id then you've got the row anyway.

Can someone explain why a tool would have generated these indexes and if there's any point in keeping them?

Just because the indexes are getting used doesn't mean they're a good design. SQL Server is using the wrong indexes lately, and I'm trying to determine why. More specifically, queries against this table that ran in milliseconds a month ago are now taking multiple seconds. The table is huge (4M rows), and growing steadily, but at the same pace it's been going at for years.

Almost every query joins across group_id or campaign_id. Often both. The application almost never specifically selects or sorts by the PK, but I could see how ID could be useful when comparing or merging lists of IDs within the database engine. … But id is also an identity column, and an index holds lists of row ids, so it still seems redundant to have the ID in the key when it's also in the index "buckets". And the additional fields could still be covered by the index without being part of the key.

I discovered that (id, sent_date, group_id) hardly ever gets read. Meanwhile (group_id, id, campaign_id, sent_date) is one of the hottest indexes. Next I found that an index which I thought was pretty reasonable (campaign_id, sent_date, group_id, id) also has a index_columns_include of every other column in the table, and occupies more space than the primary clustered index on id! (but it's also pretty hot…) Surely I can replace it with a simple (campaign_id, sent_date, group_id)?

Best Answer

Tuning tools use a shotgun approach. They put all combinations of indexes as a hypothetical indexes. A hypothetical index does not create any of the indexes but it certain type of hypothetical does keep stats on them by how many reads and writes would have taken place. Then the tuning tool gets the hypothetical with the top stats and gives them to you and destroys all hypothetical.

So someone put in indexes from a tuning tool, but I would prefer to get the actual statistics that are kept since the last time SQL Server was restarted. Below is sample code to get the statistics. Look at total reads and compare to total writes. More writes than reads is bad. If I find an index with zero reads, I drop them and I have never had to put them back.

What is the clustered index?
How many indexes? Anything over 5 indexes is usually to many.

SELECT distinct
  DB_Name(DB_id()) as DbName
  , oo.name AS object_name
  , stat.user_updates AS [Total Writes]
  , stat.user_seeks + stat.user_scans + stat.user_lookups AS [Total Reads]
  , stat.user_updates - (stat.user_seeks + stat.user_scans + stat.user_lookups) AS [Difference]
  , partition1.Rows
  , partition1.SizeMB
  , iib.name AS index_name
  , iib.type_desc
  , FilegroupName = fg.groupname
  , ISNULL(IndexColumns.index_columns_key, '---') AS index_columns_key
  , ISNULL(IndexColumns.index_columns_include, '---') AS index_columns_include
  , iib.is_primary_key
  , iib.is_unique
  , iib.is_unique_constraint
  , iib.is_hypothetical
  , STATS_DATE ( oo.[object_id] , iib.index_id ) as IndexCreatedDate
  , iib.fill_factor
FROM sys.objects                      AS oo  WITH (NOLOCK)
JOIN sys.schemas                      AS ss  WITH (NOLOCK) ON oo.schema_id=ss.schema_id
JOIN sys.indexes                      AS iib WITH (NOLOCK) ON oo.object_id=iib.object_id
left JOIN sys.dm_db_index_usage_stats AS stat WITH (NOLOCK) ON stat.[object_id] = iib.[object_id] AND iib.index_id = stat.index_id
left JOIN sys.partitions              AS pp WITH (NOLOCK) ON iib.object_id = pp.OBJECT_ID AND iib.index_id = pp.index_id
left JOIN sys.allocation_units        AS aa WITH (NOLOCK) ON pp.partition_id = aa.container_id
left JOIN sys.sysfilegroups           AS fg WITH (NOLOCK) ON fg.groupid = aa.data_space_id
left JOIN (
 SELECT
   object_id, index_id, SUM(row_count) AS Rows,
   CONVERT(numeric(19,3), CONVERT(numeric(19,3), SUM(in_row_reserved_page_count+lob_reserved_page_count+row_overflow_reserved_page_count))/CONVERT(numeric(19,3), 128)) AS SizeMB
 FROM sys.dm_db_partition_stats WITH (NOLOCK)
 GROUP BY object_id, index_id
 ) AS partition1 ON iib.object_id=partition1.object_id AND iib.index_id=partition1.index_id
 CROSS APPLY
 (
    SELECT
    (
      SELECT sys.columns.name + ', '
      FROM sys.index_columns WITH (NOLOCK)
      JOIN sys.columns WITH (NOLOCK) ON sys.index_columns.column_id=sys.columns.column_id
                       AND sys.index_columns.object_id=sys.columns.object_id
      WHERE sys.index_columns.is_included_column=0
      AND iib.object_id=sys.index_columns.object_id AND iib.index_id=sys.index_columns.index_id
      ORDER BY key_ordinal
      FOR XML PATH('')
    ) AS index_columns_key,
    (
      SELECT sys.columns.name + ', '
      FROM sys.index_columns WITH (NOLOCK)
      JOIN sys.columns WITH (NOLOCK) ON sys.index_columns.column_id=sys.columns.column_id
                       AND sys.index_columns.object_id=sys.columns.object_id
      WHERE sys.index_columns.is_included_column=1
      AND iib.object_id=sys.index_columns.object_id AND iib.index_id=sys.index_columns.index_id
      ORDER BY index_column_id
      FOR XML PATH('')
    ) AS index_columns_include
 ) AS IndexColumns
WHERE stat.database_id = DB_ID()
and oo.is_ms_shipped = 0
and oo.name not in ( 'sysdiagrams' )
and iib.is_hypothetical = 0
ORDER BY oo.name, iib.name

The index you mention will probably work. But I prefer to put an index in as an experiment and use the script above to get factual measurements as to how good an index is.

Related Solutions

Sql-server – Should these two indexes suggested by the SSMS missing index feature be combined

Well, you could consider a filtered index - if you're always looking for rows where IsSynchronized = 0 and this number should be relatively small, then instead of those two indexes, consider this instead:

CREATE NONCLUSTERED INDEX [IX_NotSynchronized] 
  ON [dbo].[PackageEvents] ([PackageID])  
  INCLUDE ([EventDate], [EventDescription], [EventID], 
    [LastSyncDate], [Notes], [UserName], [Version]) 
  WHERE IsSynchronized = 0;

Of course you may want to make that even smaller and test to see the difference in impact if the query has to look up the data (should be pretty efficient if the number of rows is small), so - assuming PackageID is the clustering key:

CREATE NONCLUSTERED INDEX [IX_NotSynchronized] 
  ON [dbo].[PackageEvents] ([PackageID])
  WHERE IsSynchronized = 0;

The overhead of maintaining this index may very well be worth the space savings compared to a full index, especially if it's only being used to optimize this query (or query pattern, at least).

Filtered indexes are not magic, though; JNK brought up some limitations below:

Caveats with filtered indexes - stats may not stay up to date without maintenance, and you need to use "standard" values for some settings like QUOTED IDENTIFIER and ANSI NULLS. These are small issues but if you have the settings wrong in a session that inserts into the index, the insert will fail.

Also you'll want to read these posts:

If you don't want to use a filtered index, you can probably test variations of these:

CREATE NONCLUSTERED INDEX [IX_Covering_try1] ON [dbo].[PackageEvents] 
  ([PackageID], IsSynchronized)  
INCLUDE ([EventDate], [EventDescription], [EventID], 
  [LastSyncDate], [Notes], [UserName], [Version]);

CREATE NONCLUSTERED INDEX [IX_Covering_try2] ON [dbo].[PackageEvents] 
  (IsSynchronized, [PackageID])  
INCLUDE ([EventDate], [EventDescription], [EventID], 
  [LastSyncDate], [Notes], [UserName], [Version]);

(For a long time I thought that including BIT columns in the key was wasteful but Martin Smith demonstrated a case where it worked quite well - worth a try. I can't find the post now.)

Without your full schema, data, query patterns etc. we can only guide you and have you test our suggestions in your environment. We can't say, "Ding! This is the one that will work for you!"

Sql-server – Duplicate non-clustered index on Primary Key, can I drop one of them

I think it's safe to remove the non-primary key as it's an exact duplicate of the primary key.

If one of them is used more and takes less space, did you check the fragmentation and last statistics date on both? Maybe one of them is more fragmented or has better stats.

Best Answer

Related Solutions

Sql-server – Should these two indexes suggested by the SSMS missing index feature be combined

Sql-server – Duplicate non-clustered index on Primary Key, can I drop one of them

Related Question