If a FK does not have a dedicated index on them but are part of wider indexes used for covering queries, Should they have a dedicated index created?
It depends on the table's access patterns. If the column is being searched a lot (and, ideally, is highly selective), then yes, you absolutely should have an index on that column, with the column as the first key column in the definition.
Should I be removing some of these indexes and combining them with included columns instead? then have dedicated indexes for my foreign keys?
What was given in the question is somewhat unclear, and the question you've asked is a bit... confused, so let's take a step back for a second.
In SQL Server 2005+, the three most important parts of an index definition are:
The key columns, which determines the index sort order. This means the order of the key columns is very important, because SQL Server uses an index by searching for a value in the first key column, then in the second key column, etc.
The included columns, which are copies of row data tagged onto the index structure. The order included columns are specified is irrelevant.
Is the index unique? This means that the index key can contain only unique combinations of column values.
(While this is not relevant to the discussion at hand, for completeness I will mention it here: SQL Server 2008+ introduces the concept of filtered indexes, which only includes rows in the index that satisfy a predicate.)
The first thing you should do is index consolidation. This involves using the points above to combine indexes that share commonalities.
For example, consider the following two indexes:
CREATE INDEX IX_1 ON [dbo].[t1](C1) INCLUDE(C3, C4);
CREATE INDEX IX_2 ON [dbo].[t1](C1, C2) INCLUDE(C5);
These indexes share the leading key column, C1
. Included columns can be specified in any order, so these two indexes could be combined as follows:
CREATE INDEX IX_3 ON [dbo].[t1](C1, C2) INCLUDE(C3, C4, C5);
Where index keys differ in their composition or other properties, you have to be very careful. Consider these indexes:
CREATE INDEX IX_4 ON [dbo].[t1](C1, C3) INCLUDE(C4);
CREATE UNIQUE INDEX IX_5 ON [dbo].[t1](C1, C4) INCLUDE(C5);
Now the decision is not as easy. You have to determine what to do based on your workload, which queries hit the table, and the selectivity of the data itself.
So to answer the question more directly: if you currently have one or more indexes where the column of interest is the first key column in those indexes, you don't have to add more indexes, because the indexes you have are useful.
If the column is searched frequently and there isn't an index with that column as the first key column, you should create an index with that column as the first key column. (Depending on query requirements, you may want to specify other columns as well, for either the key or the included columns.)
If the column is not searched frequently, you can potentially get away with having it contained in another index (not the first key column): the query may be satisfied by scanning the index that contains the column. This is not as efficient as an index seek (for many reasons), but if this operation doesn't happen too often, and the performance in this case is acceptable, you may be okay.
Remember that creating indexes isn't free -- they take up data space, log space, cache memory, and can potentially slow down INSERT
/UPDATE
/DELETE
activity (having said that, there can be other advantages to creating indexes). It's a balance you have to strike for your environment.
Performance Optimisation. By removing either the PK or an Index, there
will be less pages needed for my indexes = faster writes, plus also
maintenance/operational benefits, i.e. one less index to keep
defragged etc.
You need to be able to prove that what's proposed will actually help (cost vs. benefit). For what reason is this change being considered? Are there actually performance issues with this table, or did it just "look wrong"?
Here are some other questions that will help you come to the best decision for your environment:
How much time would be saved in the maintenance window? In the backup window?
How much storage space would this save (data files, log files, backups, etc.)?
Is INSERT
performance on this table really a bottleneck right now? How much would it improve? Is removing an index the best strategy to fix that problem?
Will this cause problems with database tools and frameworks (ORMs particularly) that expect each table to have a primary key and not just a unique index? Transactional Replication requires a primary key on published tables.
Is a self-documenting database schema important?
Despite its limited use, is the narrowness of the primary key index still allowing the optimizer to produce more efficient plans for certain queries? (Use sys.dm_db_index_usage_stats
to find out.)
Personally speaking, from what you've told us, I would leave it alone until it can be proven that both (a) the extra index is a problem, and (b) removing it is the solution.
Best Answer
The missing index suggestions are opportunistic entries added whenever the optimizer happens to notice that an exact-match index for the current set of predicates it is considering do not exist on the base object.
The information recorded in the DMVs is intended to be a helpful input to the normal activities of a skilled database tuner; it is not intended to be interpreted as the result of a comprehensive analysis. The data recorded is intentionally a complete set of the keys and other columns needed, together with a brief indication of the reason it was added (equality, inequality, include).
The "impact" recorded is also very basic: it is the difference in estimated cost, according to the cost model used by the optimizer. As usual, this needs to be treated with a healthy dose of skepticism, because it is very unlikely that the actual performance characteristics of your hardware match the optimizer's model, even vaguely.
Note also that cost estimations depend heavily on cardinality (row count), average row size, and distribution information. If these are not reasonably close to the actual values, the "cost" is an even less reliable metric. Nevertheless, right now, estimated cost is all we have, so that's the number reported. Again, SQL Server is just trying to be as helpful as it can; it is up to the person reviewing the information to interpret it correctly.
Tests based on estimated costs are potentially informative to an early assessment of target areas for improvement, but they are no substitute for actual measurements of whatever metrics are important in your environment. The relative importance of duration, CPU usage, physical reads, memory usage ... and so on ... will be different in different circumstances.
The point really is to measure where the real problems are, record a baseline, make improvements, then measure and assess the benefits against the costs.
For a more thorough analysis than the missing index feature provides, you would need to run a representative workload through the Database Engine Tuning Advisor. Even there, the recommendations will still need to be assessed by a human, and something similar to the above baseline-assess-change-measure protocol followed.
That said, not all final decisions will be based on performance comparisons. There may be other considerations, like the avoidance of a particularly inconvenient lookup-related deadlock. Index tuning is as much art as science sometimes.