You are right in that your example query would not use that index.
The query planner will consider using an index if:
- all the fields contained in it are referenced in the query
- some of the fields starting from the beginning are referenced
It will not be able to make use of indexes that start with a field not used by the query.
So for your example:
SELECT [id], [name], [customerId], [dateCreated]
FROM Representatives WHERE customerId=1
ORDER BY dateCreated
it would consider indexes such as:
[customerId]
[customerId], [dateCreated]
[customerId], [dateCreated], [name]
but not:
[name], [customerId], [dateCreated]
If it found both [customerId]
and [customerId], [dateCreated], [name]
its decision to prefer one over the other would depend on the index stats which depend on estimates of the balance of data in the fields. If [customerId], [dateCreated]
were defined it should prefer that over the other two unless you give a specific index hint to the contrary.
It is not uncommon to see one index defined for every field in my experience either, though this is rarely optimal as the extra management needed to update the indexes on insert/update, and the extra space needed to store them, is wasted when half of them may never get used - but unless your DB sees write-heavy loads the performance is not going to stink badly even with the excess indexes.
Specific indexes for frequent queries that would otherwise be slow due to table or index scanning is generally a good idea, though don't overdo it as you could be exchanging one performance issue for another. If you do define [customerId], [dateCreated]
as an index, for example, remember that the query planner will be able to use that for queries that would use an index on just [customerId]
if present. While using just [customerId]
would be slightly more efficient than using the compound index this may be mitigated by ending up having two indexes competing for space in RAM instead of one (though if your entire normal working set fits easily into RAM this extra memory competition may not be an issue).
You can store your index as a list of fixed-size offsets into the block containing your key data. For example:
+--------------+
| 3 | number of entries
+--------------+
| 16 | offset of first key data
+--------------+
| 24 | offset of second key data
+--------------+
| 39 | offset of third key data
+--------------+
| key one |
+----------------+
| key number two |
+-----------------------+
| this is the third key |
+-----------------------+
(well, the key data would be sorted in a real example, but you get the idea).
Note that this does not necessarily reflect how index blocks are actually constructed in any database. This is merely an example of how you might organise a block of index data where the key data is of variable length.
Best Answer
I was able to make this work after altering your model. Hopefully this still fulfills the use case you are targeting. To bypass the dynamic nature of your fields, I created a field named "field" and a field named "value". This allows me to create a multikey index on known fields.
Then I executed a find against this for db.myColl.find({"adfield.field":"30"}) for a before picture of the query. I also used db.myColl.explain().find() to analyze it.
Then I created the index via db.myColl.createIndex({"adfield.field"1,"adfield.value":1"}).
I executed the find from above and sure enough I see an IXScan and isMultiKey is true in the explain() output.
Again; hopefully the alteration to your model still keeps with the spirit of what you are attempting to accomplish.
You can read more details about mutlikey via http://docs.mongodb.org/manual/core/index-multikey/ .