Sql-server – ny reason for an index to contain the primary key

index-tuningsql server

I inherited a project using SQL Server 2005. Recently there were some performance problems and I started looking at the indexes, and found

_dta_index_survey_25_135059617__K1_K10_K19:
    id ASC, sent_date ASC, group_id ASC

and

_dta_index_survey_21_364632442__K18_K1_K2_K9:
    group_id ASC, id ASC, campaign_id ASC, sent_date ASC

which appears to be generated by some kind of automatic performance tuning tool. I think SQL Server comes with something like that but I'm not familiar with it.

Anyway, id is the primary key, and what I'm trying to understand here is how it could possibly be beneficial to include the primary key within a secondary index… because you have to have the leftmost pieces of information to use an index and if you know id then you've got the row anyway.

Can someone explain why a tool would have generated these indexes and if there's any point in keeping them?

Just because the indexes are getting used doesn't mean they're a good design. SQL Server is using the wrong indexes lately, and I'm trying to determine why. More specifically, queries against this table that ran in milliseconds a month ago are now taking multiple seconds. The table is huge (4M rows), and growing steadily, but at the same pace it's been going at for years.

Almost every query joins across group_id or campaign_id. Often both. The application almost never specifically selects or sorts by the PK, but I could see how ID could be useful when comparing or merging lists of IDs within the database engine. … But id is also an identity column, and an index holds lists of row ids, so it still seems redundant to have the ID in the key when it's also in the index "buckets". And the additional fields could still be covered by the index without being part of the key.

I discovered that (id, sent_date, group_id) hardly ever gets read. Meanwhile (group_id, id, campaign_id, sent_date) is one of the hottest indexes. Next I found that an index which I thought was pretty reasonable (campaign_id, sent_date, group_id, id) also has a index_columns_include of every other column in the table, and occupies more space than the primary clustered index on id! (but it's also pretty hot…) Surely I can replace it with a simple (campaign_id, sent_date, group_id)?

Best Answer

Tuning tools use a shotgun approach. They put all combinations of indexes as a hypothetical indexes. A hypothetical index does not create any of the indexes but it certain type of hypothetical does keep stats on them by how many reads and writes would have taken place. Then the tuning tool gets the hypothetical with the top stats and gives them to you and destroys all hypothetical.

So someone put in indexes from a tuning tool, but I would prefer to get the actual statistics that are kept since the last time SQL Server was restarted. Below is sample code to get the statistics. Look at total reads and compare to total writes. More writes than reads is bad. If I find an index with zero reads, I drop them and I have never had to put them back.

  1. What is the clustered index?
  2. How many indexes? Anything over 5 indexes is usually to many.

SELECT distinct
  DB_Name(DB_id()) as DbName
  , oo.name AS object_name
  , stat.user_updates AS [Total Writes]
  , stat.user_seeks + stat.user_scans + stat.user_lookups AS [Total Reads]
  , stat.user_updates - (stat.user_seeks + stat.user_scans + stat.user_lookups) AS [Difference]
  , partition1.Rows
  , partition1.SizeMB
  , iib.name AS index_name
  , iib.type_desc
  , FilegroupName = fg.groupname
  , ISNULL(IndexColumns.index_columns_key, '---') AS index_columns_key
  , ISNULL(IndexColumns.index_columns_include, '---') AS index_columns_include
  , iib.is_primary_key
  , iib.is_unique
  , iib.is_unique_constraint
  , iib.is_hypothetical
  , STATS_DATE ( oo.[object_id] , iib.index_id ) as IndexCreatedDate
  , iib.fill_factor
FROM sys.objects                      AS oo  WITH (NOLOCK)
JOIN sys.schemas                      AS ss  WITH (NOLOCK) ON oo.schema_id=ss.schema_id
JOIN sys.indexes                      AS iib WITH (NOLOCK) ON oo.object_id=iib.object_id
left JOIN sys.dm_db_index_usage_stats AS stat WITH (NOLOCK) ON stat.[object_id] = iib.[object_id] AND iib.index_id = stat.index_id
left JOIN sys.partitions              AS pp WITH (NOLOCK) ON iib.object_id = pp.OBJECT_ID AND iib.index_id = pp.index_id
left JOIN sys.allocation_units        AS aa WITH (NOLOCK) ON pp.partition_id = aa.container_id
left JOIN sys.sysfilegroups           AS fg WITH (NOLOCK) ON fg.groupid = aa.data_space_id
left JOIN (
 SELECT
   object_id, index_id, SUM(row_count) AS Rows,
   CONVERT(numeric(19,3), CONVERT(numeric(19,3), SUM(in_row_reserved_page_count+lob_reserved_page_count+row_overflow_reserved_page_count))/CONVERT(numeric(19,3), 128)) AS SizeMB
 FROM sys.dm_db_partition_stats WITH (NOLOCK)
 GROUP BY object_id, index_id
 ) AS partition1 ON iib.object_id=partition1.object_id AND iib.index_id=partition1.index_id
 CROSS APPLY
 (
    SELECT
    (
      SELECT sys.columns.name + ', '
      FROM sys.index_columns WITH (NOLOCK)
      JOIN sys.columns WITH (NOLOCK) ON sys.index_columns.column_id=sys.columns.column_id
                       AND sys.index_columns.object_id=sys.columns.object_id
      WHERE sys.index_columns.is_included_column=0
      AND iib.object_id=sys.index_columns.object_id AND iib.index_id=sys.index_columns.index_id
      ORDER BY key_ordinal
      FOR XML PATH('')
    ) AS index_columns_key,
    (
      SELECT sys.columns.name + ', '
      FROM sys.index_columns WITH (NOLOCK)
      JOIN sys.columns WITH (NOLOCK) ON sys.index_columns.column_id=sys.columns.column_id
                       AND sys.index_columns.object_id=sys.columns.object_id
      WHERE sys.index_columns.is_included_column=1
      AND iib.object_id=sys.index_columns.object_id AND iib.index_id=sys.index_columns.index_id
      ORDER BY index_column_id
      FOR XML PATH('')
    ) AS index_columns_include
 ) AS IndexColumns
WHERE stat.database_id = DB_ID()
and oo.is_ms_shipped = 0
and oo.name not in ( 'sysdiagrams' )
and iib.is_hypothetical = 0
ORDER BY oo.name, iib.name

The index you mention will probably work. But I prefer to put an index in as an experiment and use the script above to get factual measurements as to how good an index is.