How to Find Guidance on Index Strategies

index

Most of us will probably agree that using database indexes is good. Too many indexes and performance can actually be degraded.

As a general rule, which fields should be indexed?
Which fields should not be indexed?
What are the rules for using indexes while striking a balance between too many and not enough indexes in order achieve performance improvements, not degradation?

Best Answer

Short

The "too many indexes" rule is a bit misleading I think.

Long

Given that the average database is around 98% reads (or higher) reads need to be optimised. An INSERT is a read if there is a unique index, for example. Or the WHERE on an update. I once read that even a write intensive database is still 85% reads.

What you do have is poor quality indexing. Examples:

  • wide clustered indexes (SQL Server especially)
  • non-monotonic clustered indexed
  • overlapping indexes (eg cold, cole and cold, cole, colf)
  • many single column indexes (also overlapping with more useful indexes) that are useless for your queries
  • no INCLUDEs, not covering (eg all single column indexes)
  • ...

Note it's quite typical to have indexes several times bigger than your actual data even in OLTP systems.

Generally, I'd start with the

  • clustered index (usually PK)
  • unique indexes (not constraints, these can't be covering)
  • foreign key columns

Then I'd look at:

  • common queries and see what I need. A query running every seconds needs tuning. The report at Sunday 4am can wait.
  • with SQL Server, the weighted missing index DMVs

Saying that, I have broken these rules for some systems after seeing how things panned out (10 billion rows later) to tune a system. But I'd never consider not indexing unless I could demonstrate why I'm doing so.