Sql-server – Are these good indexing guidelines

indexsql server

I use indexes regularly but it's still hard for me in certain conditions to know if they are helping or hurting. There are a few guidelines I follow, but I am not sure if they are good nor am I sure of their justification.

It is better to create an index on a narrow datatype than it is on a wide datatype (e.g. INT over DATETIME).
It is better to create an index on multiple columns than it is on a single column.
It is better to create an index on a column that is never (or rarely) updated than it is to index frequently changing columns.

Are these good guidelines? Since I'm not entirely sure why I follow these guidelines, can you help explain what is the justification for each and when would they not apply?

Best Answer

Other points (as noted, "How do I index?" is a topic for a book not a single post - also, so many of the answers come down to "it depends on your database and your workload"):

Index selectivity is key. If you try and index a field that doesn't have many distinct values (i.e. a true/false bit field) then using that index is actually slower than just doing a table scan (but it still has to be maintained, thereby slowing DML calls [insert/update/delete] down to no benefit).
In general, you're right about indexing narrow fields, but I'd rephrase it to "be very careful about indexing wide fields". If the index field is too narrow, you run into the selectivity problem above, but the wider the field the bigger the index is ("how much bigger?" depends on which DBMS you use).
Indexing updated fields - If you have an index on a frequently-updated field, then yes that index will slow down updates to that field. However, if you're using that field quite often in query criteria it may still be worth it to index it. (see above: "it depends")
Multi-column indexes: This is a tricky one:
- Covering indexes (an index that contains all the fields for a query) can speed up queries (because then the query only has to look at the index - it doesn't have to refer to the base table).
- Multi-column indexes generally have higher selectivity.
- Multi-column indexes require more space.
- They're only useful if the query filters on either the full index, or a leading subset. I.e. if you have an index on (State, County, ZIP), then queries filtering on (State), (State, County) or (State, County, ZIP) can use that index. Queries that filter on (County, ZIP), (County) or (ZIP) cannot use that index.
- Corollary: The order of columns in a multi-column index is very important.
- If you have a multi-column index (State, County, ZIP), then a single-column index on (State) would be redundant (since State is the first column, the multi-column index can be used for that). Note that SQL Server and Sybase (not sure about other RDBMS systems) don't prevent you from creating completely redundant indexes.

As always, the best indexing strategy is to analyze the workload your database is under and index to suit that. If you're indexing a data warehouse your indexes are going to be radically different from the indexes on an audit history database.

Related Solutions

Sql-server – Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table

Ok, I am making a lot of assumptions (INT instead of VARCHAR(50) being one of them) with this answer, so feel free to correct me if needed. The problem with option B is that it introduces a new join to relate Users to Alerts without any real added benefit. If joining on the UserID, it is best to index the UserID, so you can utilize seeks for your joins.

For Option A, UserID will be the clustering key (index key for the clustered index) on the Users table. UserID will be a nonclustered index key on Alerts table. This will cost 16 bytes per Alert.

For Option B, UserID will be the clustering key on the Users table. UserId will probably be the clustering key in UserMap too, to make joining more efficient. UserKey (assuming this is an INT) would then be a nonclustered index key on the Alerts table. This will cost 4 bytes per Alert. And 20 bytes per UserMap.

Looking at the big picture, one relationship, for Option A, costs 16 bytes of storage, and involves 1 join operation. Whereas, one relationship, for Option B, costs 24 bytes of storage, and involves 2 join operations.

Furthermore, there are a possibility of 340,282,366,920,938,000,000,000,000,000,000,000,000 uniqueidentifiers and only 4,294,967,296 INTs. Implementing a uniqueidentifier to INT map for a this type of relationship could cause unexpected results when you start reusing INTs.

The only reason for creating this type map table, is if you plan on creating a Many to Many relationship between Users and Alerts.

Taking all of this into consideration, I would recommend Option A.

I hope this helps,

Matt

Sql-server – SQL server indexing foreign keys, covering indexes included columns

If a FK does not have a dedicated index on them but are part of wider indexes used for covering queries, Should they have a dedicated index created?

It depends on the table's access patterns. If the column is being searched a lot (and, ideally, is highly selective), then yes, you absolutely should have an index on that column, with the column as the first key column in the definition.

Should I be removing some of these indexes and combining them with included columns instead? then have dedicated indexes for my foreign keys?

What was given in the question is somewhat unclear, and the question you've asked is a bit... confused, so let's take a step back for a second.

In SQL Server 2005+, the three most important parts of an index definition are:

The key columns, which determines the index sort order. This means the order of the key columns is very important, because SQL Server uses an index by searching for a value in the first key column, then in the second key column, etc.
The included columns, which are copies of row data tagged onto the index structure. The order included columns are specified is irrelevant.
Is the index unique? This means that the index key can contain only unique combinations of column values.

(While this is not relevant to the discussion at hand, for completeness I will mention it here: SQL Server 2008+ introduces the concept of filtered indexes, which only includes rows in the index that satisfy a predicate.)

The first thing you should do is index consolidation. This involves using the points above to combine indexes that share commonalities.

For example, consider the following two indexes:

CREATE INDEX IX_1 ON [dbo].[t1](C1) INCLUDE(C3, C4);
CREATE INDEX IX_2 ON [dbo].[t1](C1, C2) INCLUDE(C5);

These indexes share the leading key column, C1. Included columns can be specified in any order, so these two indexes could be combined as follows:

CREATE INDEX IX_3 ON [dbo].[t1](C1, C2) INCLUDE(C3, C4, C5);

Where index keys differ in their composition or other properties, you have to be very careful. Consider these indexes:

CREATE INDEX IX_4 ON [dbo].[t1](C1, C3) INCLUDE(C4);
CREATE UNIQUE INDEX IX_5 ON [dbo].[t1](C1, C4) INCLUDE(C5);

Now the decision is not as easy. You have to determine what to do based on your workload, which queries hit the table, and the selectivity of the data itself.

So to answer the question more directly: if you currently have one or more indexes where the column of interest is the first key column in those indexes, you don't have to add more indexes, because the indexes you have are useful.

If the column is searched frequently and there isn't an index with that column as the first key column, you should create an index with that column as the first key column. (Depending on query requirements, you may want to specify other columns as well, for either the key or the included columns.)

If the column is not searched frequently, you can potentially get away with having it contained in another index (not the first key column): the query may be satisfied by scanning the index that contains the column. This is not as efficient as an index seek (for many reasons), but if this operation doesn't happen too often, and the performance in this case is acceptable, you may be okay.

Remember that creating indexes isn't free -- they take up data space, log space, cache memory, and can potentially slow down INSERT/UPDATE/DELETE activity (having said that, there can be other advantages to creating indexes). It's a balance you have to strike for your environment.

Best Answer

Related Solutions

Sql-server – Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table

Sql-server – SQL server indexing foreign keys, covering indexes included columns

Related Question