Sql-server – Are these good indexing guidelines

indexsql server

I use indexes regularly but it's still hard for me in certain conditions to know if they are helping or hurting. There are a few guidelines I follow, but I am not sure if they are good nor am I sure of their justification.

  1. It is better to create an index on a narrow datatype than it is on a wide datatype (e.g. INT over DATETIME).

  2. It is better to create an index on multiple columns than it is on a single column.

  3. It is better to create an index on a column that is never (or rarely) updated than it is to index frequently changing columns.

Are these good guidelines? Since I'm not entirely sure why I follow these guidelines, can you help explain what is the justification for each and when would they not apply?

Best Answer

Other points (as noted, "How do I index?" is a topic for a book not a single post - also, so many of the answers come down to "it depends on your database and your workload"):

  • Index selectivity is key. If you try and index a field that doesn't have many distinct values (i.e. a true/false bit field) then using that index is actually slower than just doing a table scan (but it still has to be maintained, thereby slowing DML calls [insert/update/delete] down to no benefit).
  • In general, you're right about indexing narrow fields, but I'd rephrase it to "be very careful about indexing wide fields". If the index field is too narrow, you run into the selectivity problem above, but the wider the field the bigger the index is ("how much bigger?" depends on which DBMS you use).
  • Indexing updated fields - If you have an index on a frequently-updated field, then yes that index will slow down updates to that field. However, if you're using that field quite often in query criteria it may still be worth it to index it. (see above: "it depends")
  • Multi-column indexes: This is a tricky one:
    • Covering indexes (an index that contains all the fields for a query) can speed up queries (because then the query only has to look at the index - it doesn't have to refer to the base table).
    • Multi-column indexes generally have higher selectivity.
    • Multi-column indexes require more space.
    • They're only useful if the query filters on either the full index, or a leading subset. I.e. if you have an index on (State, County, ZIP), then queries filtering on (State), (State, County) or (State, County, ZIP) can use that index. Queries that filter on (County, ZIP), (County) or (ZIP) cannot use that index.
    • Corollary: The order of columns in a multi-column index is very important.
    • If you have a multi-column index (State, County, ZIP), then a single-column index on (State) would be redundant (since State is the first column, the multi-column index can be used for that). Note that SQL Server and Sybase (not sure about other RDBMS systems) don't prevent you from creating completely redundant indexes.

As always, the best indexing strategy is to analyze the workload your database is under and index to suit that. If you're indexing a data warehouse your indexes are going to be radically different from the indexes on an audit history database.