I have two questions, I'll give you the facts, then my questions.
The actual index I'm using for one of my queries has the following characteristics :
- Multikey (contains 4 keys).
- The average key length is 14 characters.
- 2 indexes values contains 2 letters (country code and state code).
- 1 index values contain numerical values (between 0 and 100).
- The index size is : ~395 MB.
Here's an example of what should be indexed (I don't know how mongodb actually is storing it's indexes, so I'm going to represent it like if it was a collection document):
{
"geolocation.statecode": "AL",
"personnu_field": 50,
"geolocation.countrycode": "US"
"field1.sufield": "Awesome value"
}
Informations about the collection concerned by the index :
- The collection is a 6M documents.
- Fast growing.
- It's actually a collection of Twitter users with some additional business related fields.
- field1 (see the example given above) is an array of subdocuments.
- avgObjSize: 3 KB.
- totalIndexSize: ~2.9 GB.
- storageSize: ~19 GB.
I've done an explain
to this long query :
db.crawler_users.find(
{
"geolocation.statecode": "AL",
personnu_field: { "$lte": 65, "$gte": 30 },
"geolocation.countrycode": "US"
},
{
personnu_field:1, _id:0
}
).hint(
{
"geolocation.countrycode" : 1,
"personnu_field" : -1,
"geolocation.statecode" : 1,
"field1.sufield" : 1
}
).explain()
And here's the result:
{
"cursor" : "BtreeCursor geolocation.countrycode_1_personnu_field_-1_geolocation.statecode_1_field1.sufield_1",
"isMultiKey" : true,
"n" : 216,
"nscannedObjects" : 788609,
"nscanned" : 788609,
"nscannedObjectsAllPlans" : 788609,
"nscannedAllPlans" : 788609,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 128,
"nChunkSkips" : 0,
"millis" : 127451,
"indexBounds" : {
"geolocation.countrycode" : [
[
"US",
"US"
]
],
"personnu_field" : [
[
65,
-1.7976931348623157e+308
]
],
"geolocation.statecode" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"search.keyword" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
}
As you can see the query takes much time to be executed (>2 minutes). And it's hitting the collection. even if the chosen fields already exists on the index.
I have two questions somehow related :
- Why is
indexOnly: false
. Isn't it supposed to be a covered index query? (see the explain later) - I need to retrieve some additional fields from the collection (the id and the profile_picture url). Should I add them to the index to avoid hitting the collection, even if I'll never have to query them?
Best Answer
I believe this is a result of the
isMultiKey : true
field in the explain results. Basically, currentlyindexOnly
is never true whenisMultiKey
is true.This is a known problem in general with multi key indexes. You can find the relevant bug here:
https://jira.mongodb.org/browse/SERVER-3173
As well as some decent explanation in the linked/dupe bug here:
https://jira.mongodb.org/browse/SERVER-7595
I think you have done some manual munging of the fields here for some reason, but I would guess that
search.keywords
is the problem here. Try an index without that as the final field and see if that performs better.I'd recommend a separate index for those queries rather than massive single index. If you end up with too many fields in the index you are going to lose most of the benefit by simply having to scan through a massive index instead of a collection. An index that big will also likely have performance issues for updates/writes.