I also asked this same question (in a bit expanded form) on the mongodb-user mailing list, where I got an answer. Read from there to get more details. The short answer is, that the strategy used in the question should work fine, but there's an issue that makes it very inefficient. Hopefully, the issue will be fixed soon.
For my case, I only need to query for exact matches for the tuple {n,v}, so I can create a multikey index:
db.mycollection.ensureIndex({"attrs":1})
and make they query like this:
db.mycollection.find({"attrs": {n: "subject", v: "Some subject"}})
which works great and uses the index very effectively.
You can get overall (instance wide) hit info from the db.serverStatus()
command, and in particular:
http://docs.mongodb.org/manual/reference/command/serverStatus/#serverStatus.indexCounters.hits
http://docs.mongodb.org/manual/reference/command/serverStatus/#serverStatus.indexCounters.misses
These numbers are totals seen for the lifetime of the process, so if you want to get a rate or see the trend over time, then you will need to record them periodically and diff the values accordingly. Thankfully, if you install MMS Monitoring, it will do that for you and graph the results.
However, there are currently no per-index statistics available. The relevant feature request can be found here for tracking and voting purposes:
https://jira.mongodb.org/browse/SERVER-2227
Update: January 2016
The stats referenced above have been removed from the output of the server status command, so are no longer available. However, the referenced feature request is now complete and is available in version 3.2 in the form of the $indexStats
aggregation operator (the linked docs also contain sample output). For completeness, here's an example I put together:
Prior to any queries I have just 2 indexes, the default _id
and indexme
, both with 0 ops:
> db.foo.aggregate( [ { $indexStats: { } } ] ).pretty()
{
"name" : "indexme_1",
"key" : {
"indexme" : 1
},
"host" : "localhost:27017",
"accesses" : {
"ops" : NumberLong(0),
"since" : ISODate("2016-01-12T19:03:01.358Z")
}
}
{
"name" : "_id_",
"key" : {
"_id" : 1
},
"host" : "localhost:27017",
"accesses" : {
"ops" : NumberLong(0),
"since" : ISODate("2016-01-12T18:59:24.292Z")
}
}
Then run a couple of finds to bump the ops on indexme
and check again:
> db.timecheck.find({indexme: 33})
> db.timecheck.find({indexme: 55})
> db.timecheck.aggregate( [ { $indexStats: { } } ] ).pretty()
{
"name" : "indexme_1",
"key" : {
"indexme" : 1
},
"host" : "localhost:27017",
"accesses" : {
"ops" : NumberLong(2),
"since" : ISODate("2016-01-12T19:03:01.358Z")
}
}
{
"name" : "_id_",
"key" : {
"_id" : 1
},
"host" : "localhost:27017",
"accesses" : {
"ops" : NumberLong(0),
"since" : ISODate("2016-01-12T18:59:24.292Z")
}
}
Best Answer
The order of the fields in the index definition is important.
The indexed values are sorted in the order the fields appear in the spec, so in the index:
{ _id: 1, Field2: 1,Field4: 1, "Field1.date":1}
, the values will be sorted first by _id, then by Field2, etc.This means that in order to find documents with dates in the range you specified, it would have to examine every index entry.
It that case is probably just as efficient to scan all of the documents.
Instead build an index on
{ "Field1.date":1, _id: 1, Field2: 1,Field4: 1}
. That will allow the query executor to find all of the documents by scanning a single subset of the index, and it will have all of the projected fields available to avoid fetching any documents.