I inherited a project using SQL Server 2005. Recently there were some performance problems and I started looking at the indexes, and found
_dta_index_survey_25_135059617__K1_K10_K19:
id ASC, sent_date ASC, group_id ASC
and
_dta_index_survey_21_364632442__K18_K1_K2_K9:
group_id ASC, id ASC, campaign_id ASC, sent_date ASC
which appears to be generated by some kind of automatic performance tuning tool. I think SQL Server comes with something like that but I'm not familiar with it.
Anyway, id
is the primary key, and what I'm trying to understand here is how it could possibly be beneficial to include the primary key within a secondary index… because you have to have the leftmost pieces of information to use an index and if you know id
then you've got the row anyway.
Can someone explain why a tool would have generated these indexes and if there's any point in keeping them?
Just because the indexes are getting used doesn't mean they're a good design. SQL Server is using the wrong indexes lately, and I'm trying to determine why. More specifically, queries against this table that ran in milliseconds a month ago are now taking multiple seconds. The table is huge (4M rows), and growing steadily, but at the same pace it's been going at for years.
Almost every query joins across group_id
or campaign_id
. Often both. The application almost never specifically selects or sorts by the PK, but I could see how ID
could be useful when comparing or merging lists of ID
s within the database engine. … But id
is also an identity column, and an index holds lists of row ids, so it still seems redundant to have the ID
in the key when it's also in the index "buckets". And the additional fields could still be covered by the index without being part of the key.
I discovered that (id, sent_date, group_id)
hardly ever gets read. Meanwhile (group_id, id, campaign_id, sent_date)
is one of the hottest indexes. Next I found that an index which I thought was pretty reasonable (campaign_id, sent_date, group_id, id)
also has a index_columns_include
of every other column in the table, and occupies more space than the primary clustered index on id
! (but it's also pretty hot…) Surely I can replace it with a simple (campaign_id, sent_date, group_id)
?
Best Answer
Tuning tools use a shotgun approach. They put all combinations of indexes as a hypothetical indexes. A hypothetical index does not create any of the indexes but it certain type of hypothetical does keep stats on them by how many reads and writes would have taken place. Then the tuning tool gets the hypothetical with the top stats and gives them to you and destroys all hypothetical.
So someone put in indexes from a tuning tool, but I would prefer to get the actual statistics that are kept since the last time SQL Server was restarted. Below is sample code to get the statistics. Look at total reads and compare to total writes. More writes than reads is bad. If I find an index with zero reads, I drop them and I have never had to put them back.
The index you mention will probably work. But I prefer to put an index in as an experiment and use the script above to get factual measurements as to how good an index is.