Sql-server – Do I need separate indexes for each type of query, or will one multi-column index work

indexsql server

I somewhat know the answer to this question already, but I always feel as though there is more I need to pick up on the topic.

My basic understanding is that generally speaking, a single index that just includes all the fields you might be querying/sorting on at any given time isn't likely to be useful, yet I have seen this type of thing. As in, someone thought, "Well, if we just put all this stuff in an index, the database can use it to find what it needs", without having ever seen an execution plan for some of the actual queries being run.

Imagine a table like so:

id int pk/uid
name varchar(50)
customerId int (foreign key)
dateCreated datetime

I might see a single index including the name, customerId and dateCreated fields.

But my understanding is that such an index would not be used in a query like, for example:

SELECT [id], [name], [customerId], [dateCreated]
   FROM Representatives WHERE customerId=1 
   ORDER BY dateCreated

For such a query, it seems to me that a better idea would be an index including the customerId and dateCreated fields, with the customerId field being 'first'. This would create an index that would have the data organized in such a way that this query could quickly find what it needs – in the order that it needs.

Another thing I see, perhaps as frequently as the first, is individual indexes on each field; so, one each on name, customerId and dateCreated fields.

Unlike the first example, this type of arrangement seems to me sometimes to at least be partially useful; the query's execution plan may show that at least it's using the index on the customerId to select the records, but it's not using the index with the dateCreated field to sort them.


I know this is a broad question, because the specific answer to any particular query on any particular set of tables is usually to see what the execution plan says it's going to do, and otherwise take the specifics of the table(s) and queries into account. Also, I know that it depends on how often a query might be run as opposed to the overhead of maintaining a particular index for it.

But I suppose what I'm asking is as a general 'starting point' for indexes, does the idea of having specific indexes for specific, frequently-pulled queries and the fields in the WHERE or ORDER BY clauses make sense?

Best Answer

You are right in that your example query would not use that index.

The query planner will consider using an index if:

  • all the fields contained in it are referenced in the query
  • some of the fields starting from the beginning are referenced

It will not be able to make use of indexes that start with a field not used by the query.

So for your example:

SELECT [id], [name], [customerId], [dateCreated]
   FROM Representatives WHERE customerId=1 
   ORDER BY dateCreated

it would consider indexes such as:

[customerId]
[customerId], [dateCreated]
[customerId], [dateCreated], [name]

but not:

[name], [customerId], [dateCreated]

If it found both [customerId] and [customerId], [dateCreated], [name] its decision to prefer one over the other would depend on the index stats which depend on estimates of the balance of data in the fields. If [customerId], [dateCreated] were defined it should prefer that over the other two unless you give a specific index hint to the contrary.

It is not uncommon to see one index defined for every field in my experience either, though this is rarely optimal as the extra management needed to update the indexes on insert/update, and the extra space needed to store them, is wasted when half of them may never get used - but unless your DB sees write-heavy loads the performance is not going to stink badly even with the excess indexes.

Specific indexes for frequent queries that would otherwise be slow due to table or index scanning is generally a good idea, though don't overdo it as you could be exchanging one performance issue for another. If you do define [customerId], [dateCreated] as an index, for example, remember that the query planner will be able to use that for queries that would use an index on just [customerId] if present. While using just [customerId] would be slightly more efficient than using the compound index this may be mitigated by ending up having two indexes competing for space in RAM instead of one (though if your entire normal working set fits easily into RAM this extra memory competition may not be an issue).