Sql-server – Index included columns

sql server

After running a fairly hefty query, the execution plan gave me a missing index suggestion which was of the form:

(Timestamp) INCLUDE (CustomerID, EventID, ID, EmployeeID)

Which seems to be a covering index (the INCLUDE column are all either primary keys (ID) or foreign keys). However, my querys WHERE clause is filtering by Timestamp, CustomerID, and EventID. I don't know why these weren't included in the main part of the index.

So my question is, is there any difference in using the suggested index above, or what I think is a better alternative;

(Timestamp, CustomerID, EventID) INCLUDE (ID, EmployeeID)

My understanding is that this will still allow Timestamp-only index seeking, but will also further assist my query by having the customer and event IDs (which are filtered) in the main part.

I think this was something to do with the width of the 'main' part – FYI, Timestamp is a datetime2(0), CustomerID is an int, and EventID is a byte.

I am testing this myself at the moment, but this is a HUGE table – over 1,000,000,000 rows – and it is taking time to compare the indexes. That, and I'd like to learn more about this.

Thanks.

Best Answer

So my question is, is there any difference in using the suggested index above, or what I think is a better alternative...

The missing-index suggestions made by the optimizer are opportunistic and relevant only to the particular query concerned. The optimizer goes through an index analysis phase, where it might note the absence of a covering index it didn't find. These suggestions are not intended to be a replacement for a full workload-representative DTA session, much less proper index design by a skilled database practitioner based on wide knowledge of the data and critical queries.

The suggestions should always be reviewed, as you have done, to ensure an optimal set of indexes for all queries is created - not one covering index per query as could be the case if the suggestions were followed literally.

There are naturally implications when widening the keys of an index compared with using INCLUDE column, some of which have been noted by others. I personally prefer to INCLUDE the clustering keys explicitly where they are useful. Clustered indexes can be changed, and it is rare that the person performing this change would check to see if any queries were relying on the implicit behaviour.

Changing columns from INCLUDE to keys may also affect update query plans (overall shape and Halloween Protection requirements) and there are logging implications where keys of an index might change too.

I would probably choose to modify the suggestion as you have done, but I would be careful to validate update (= insert/update/delete/merge) query plans for the affected table.

Related Solutions

Sql-server – How are bit fields sorted for an index

A SQL Server bit field can be 0, 1 or NULL, so I'm unsure where the -1 part of your question features.

Skipping past that, the most appropriate solution for "sticky threads" in a forum database is to use a separate filtered index. Flag a sticky thread as IsSticky=1 and create a filtered index along the lines of:

CREATE NONCLUSTERED INDEX IX_Posts_StickyThread
    ON dbo.Posts (ForumId, PostId)
WHERE IsSticky = 1;

Filtered indexes were new to SQL2008 so this wouldn't be an option for SQL2005. The SQLCat article Using a Low-Selectivity BIT Column First Can Be the Best Strategy would be applicable reading for an approach to this in 2005 (edit: article has been archived to the eBook SQLCAT's Guide to Relational Engine).

And just to address the title question bit fields in an index are ordered same as per ORDER BY bit_field i.e. NULL, 0, 1 (credit @MartinSmith).

Sql-server – SQL server indexing foreign keys, covering indexes included columns

If a FK does not have a dedicated index on them but are part of wider indexes used for covering queries, Should they have a dedicated index created?

It depends on the table's access patterns. If the column is being searched a lot (and, ideally, is highly selective), then yes, you absolutely should have an index on that column, with the column as the first key column in the definition.

Should I be removing some of these indexes and combining them with included columns instead? then have dedicated indexes for my foreign keys?

What was given in the question is somewhat unclear, and the question you've asked is a bit... confused, so let's take a step back for a second.

In SQL Server 2005+, the three most important parts of an index definition are:

The key columns, which determines the index sort order. This means the order of the key columns is very important, because SQL Server uses an index by searching for a value in the first key column, then in the second key column, etc.
The included columns, which are copies of row data tagged onto the index structure. The order included columns are specified is irrelevant.
Is the index unique? This means that the index key can contain only unique combinations of column values.

(While this is not relevant to the discussion at hand, for completeness I will mention it here: SQL Server 2008+ introduces the concept of filtered indexes, which only includes rows in the index that satisfy a predicate.)

The first thing you should do is index consolidation. This involves using the points above to combine indexes that share commonalities.

For example, consider the following two indexes:

CREATE INDEX IX_1 ON [dbo].[t1](C1) INCLUDE(C3, C4);
CREATE INDEX IX_2 ON [dbo].[t1](C1, C2) INCLUDE(C5);

These indexes share the leading key column, C1. Included columns can be specified in any order, so these two indexes could be combined as follows:

CREATE INDEX IX_3 ON [dbo].[t1](C1, C2) INCLUDE(C3, C4, C5);

Where index keys differ in their composition or other properties, you have to be very careful. Consider these indexes:

CREATE INDEX IX_4 ON [dbo].[t1](C1, C3) INCLUDE(C4);
CREATE UNIQUE INDEX IX_5 ON [dbo].[t1](C1, C4) INCLUDE(C5);

Now the decision is not as easy. You have to determine what to do based on your workload, which queries hit the table, and the selectivity of the data itself.

So to answer the question more directly: if you currently have one or more indexes where the column of interest is the first key column in those indexes, you don't have to add more indexes, because the indexes you have are useful.

If the column is searched frequently and there isn't an index with that column as the first key column, you should create an index with that column as the first key column. (Depending on query requirements, you may want to specify other columns as well, for either the key or the included columns.)

If the column is not searched frequently, you can potentially get away with having it contained in another index (not the first key column): the query may be satisfied by scanning the index that contains the column. This is not as efficient as an index seek (for many reasons), but if this operation doesn't happen too often, and the performance in this case is acceptable, you may be okay.

Remember that creating indexes isn't free -- they take up data space, log space, cache memory, and can potentially slow down INSERT/UPDATE/DELETE activity (having said that, there can be other advantages to creating indexes). It's a balance you have to strike for your environment.

Best Answer

Related Solutions

Sql-server – How are bit fields sorted for an index

Sql-server – SQL server indexing foreign keys, covering indexes included columns

Related Question