You are right in that your example query would not use that index.
The query planner will consider using an index if:
- all the fields contained in it are referenced in the query
- some of the fields starting from the beginning are referenced
It will not be able to make use of indexes that start with a field not used by the query.
So for your example:
SELECT [id], [name], [customerId], [dateCreated]
FROM Representatives WHERE customerId=1
ORDER BY dateCreated
it would consider indexes such as:
[customerId]
[customerId], [dateCreated]
[customerId], [dateCreated], [name]
but not:
[name], [customerId], [dateCreated]
If it found both [customerId]
and [customerId], [dateCreated], [name]
its decision to prefer one over the other would depend on the index stats which depend on estimates of the balance of data in the fields. If [customerId], [dateCreated]
were defined it should prefer that over the other two unless you give a specific index hint to the contrary.
It is not uncommon to see one index defined for every field in my experience either, though this is rarely optimal as the extra management needed to update the indexes on insert/update, and the extra space needed to store them, is wasted when half of them may never get used - but unless your DB sees write-heavy loads the performance is not going to stink badly even with the excess indexes.
Specific indexes for frequent queries that would otherwise be slow due to table or index scanning is generally a good idea, though don't overdo it as you could be exchanging one performance issue for another. If you do define [customerId], [dateCreated]
as an index, for example, remember that the query planner will be able to use that for queries that would use an index on just [customerId]
if present. While using just [customerId]
would be slightly more efficient than using the compound index this may be mitigated by ending up having two indexes competing for space in RAM instead of one (though if your entire normal working set fits easily into RAM this extra memory competition may not be an issue).
Indexes are in most cases a B-Tree structure (or some sort of). There are DBMS that supported different indexing types.
As you are talking about longitude/latitude PostgreSQL with it's GIST indexes come to mind. Oracle has Bitmap indexes in addition to the B-Tree indexes. I'm sure SQL Server and DB2 also have some special index types. And then there are full text indexes which make searching text very efficient.
A B-Tree index is very efficient in finding a specific value - think of a primary key index where all values are different. If the index only contains the PK column(s) (i.e. it is not a clustered index) then typically looking up a row by a specific PK values takes not more than (roughly) 3-4 IO operations (at least with Oracle). 2-3 to find the index block and an additional one to read the whole row. This gets more efficient if the index contains additional columns so that the lookup of the actual table row is not needed. The term for that is "covering index" or "index only retrieval".
Now for doing "range lookups" (e.g. where foo > 42
) an index is very helpful as well as in most DBMS the index can also be scanned according to a predicate. Usually (again this highly depends on the DBMS) this is slightly less efficient than a direct lookup (again this also depends on the ability to do an "index only retrieval").
I don't know of any BMS which can not use more than one index in a query. Think a join on a PK and a FK column - depending on the data distribution the DBMS might use the index to find the parent rows (PK lookup) and the child rows (FK lookup).
But not all DBMS can use more than one index for the same table in a single query.
After all whether or not an index is being used or not depends on a lot of things.
I can highly recommend http://use-the-index-luke.com/ which is a very good introduction on indexing across all major DBMS.
DBMS specific information:
Oracle: http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm
PostgreSQL: http://www.postgresql.org/docs/current/static/indexes.html
Best Answer
Answer for SQL Server
The short answer is that the one column index could work depending on your workload & data, but there are different options like
included columns
in sql server, to store an unordered column.Two indexes could be used, but in most cases (for sql server) a
key lookup
operator will be applied to get the other column(s) needed. Accessing two indexes / adding akey lookup
operator will be slower due to the addedJOIN
operator between the two.An example of when it should go fast is when the optimizer thinks that almost no or no values in the specified range of
latitute
exist (latitude > 51.4946 AND latitude < 51.5079
) while having the one column index, the seek should be pretty fast (keeping in mind that the statistics are correct & up to date). See Example 1 for this part.Another important part of your example query is that with ranges & indexes, the secondary column will be added as a residual predicate in a simple seek even when both columns are key columns.
See Example 3 for the residual predicate explanation.
Start data
Example 1 One column index + (unused) key lookup
Single column indexes
Due to no values returned by the seek on latitude, no key lookup (to get the
longitude
column and apply a residual predicate on that column) operation happened:Example 2 Removing the key lookup
When we include the
longitude
in the index, no key lookup operator is found (but there is a residual predicate on this column.From these examples we can make up that, if SQL server cannot find a first selective filter on
latitude
orlongitude
one of these will be picked and the residual predicate / key lookup will be the operation that will hurt us when the data set increases.Example 3 When adding the double key column indexes
Here the decision of selectivity becomes important. Will the first key column selected be latitude or longitude, which one will always be more selective? If that question cannot be answered you could opt to create two indexes:
But due to the fact that you are working with ranges
<>
, the secondary column will be added as a residual predicate:The best indexes for the query
All this shows us, that for SQL Server & your particular query the better indexes would be
Where the included columns are not ordered since we are not getting around the residual predicate.
Unless we are also executing queries with different filters on
Longitude
orLatitude
, such as=
on both columns or=
on one of the columns, right indexing could remove the residual predicate for these types of queries.Some more information on indexing range queries can be found here.