Why is indexOnly: false. Isn't it supposed to be a covered index
query? (see the explain later)
I believe this is a result of the isMultiKey : true
field in the explain results. Basically, currently indexOnly
is never true when isMultiKey
is true.
This is a known problem in general with multi key indexes. You can find the relevant bug here:
https://jira.mongodb.org/browse/SERVER-3173
As well as some decent explanation in the linked/dupe bug here:
https://jira.mongodb.org/browse/SERVER-7595
I think you have done some manual munging of the fields here for some reason, but I would guess that search.keywords
is the problem here. Try an index without that as the final field and see if that performs better.
I need to retrieve some additional fields from the collection (the id
and the profile_picture url). Should I add them to the index to avoid
hitting the collection, even if I'll never have to query them?
I'd recommend a separate index for those queries rather than massive single index. If you end up with too many fields in the index you are going to lose most of the benefit by simply having to scan through a massive index instead of a collection. An index that big will also likely have performance issues for updates/writes.
Remember, MongoDB has a dynamic schema. So it is perfectly ok to store this document:
{
"JobNumber" : "50001-01",
"CustomerId" : "joe",
"IdentifierNumber" : NumberLong(8812739),
"TimesPrinted" : 0,
"Packaging" : {"bundle":1200,"box":120,"pallet":3}
}
and this document
{
"JobNumber" : "50001-02",
"CustomerId" : "jane",
"IdentifierNumber" : NumberLong(8812739),
"TimesPrinted" : 0,
"Packaging" : {"sack":200}
}
in the same collection.
Since, I wouldn't query for the Nth document, but for a given field in the subdocument, for example
db.collection.find({"packaging.bundle":1200})
which would run just fine with MongoDB. The reason behind that is that if a field isn't present in a document, it is evaluated as null
for a query. And null
is definitely not equal to 1200.
As for the performance. It really depends on who big your collection is and how your queries look like. While the query as shown above may be rather slow in a collection containing hundred of thousands of documents (or even more) without an index, it can be extremely fast when you created an index on it, e.g.
db.collection.ensureIndex({"packaging.bundle":1,"packaging.box":1,"packaging.pallet":1});
If you can create an index like this obviously depends on the question wether you really have arbitrary packaging or if you simply have a variety of packaging options. If the latter is the case, I'd create an index for each of the packaging options, utilizing sparse indices, e.g.
db.collection.ensureIndex({"packaging.sack":1},{sparse:true})
This would reduce the index size, as only documents which hold the field "packaging.sack" would be contained in this index.
If you really have arbitrary fields in the documents, I wonder how you create a model for it ;)
When talking of just some ten thousands of documents, you might even get satisfying result without an index.
Best Answer
Your question specifies:
Your query is count all documents where the array is not empty, is this intended?
Depending upon the cardinality of the values of this field, this could be causing these values to be expensive.
To query for empty arrays, the operation would be:
Note that there are edge cases to this query. For example, the following would be returned:
The index usage of the count operation can be determined using the collection.explain() function.
For example:
The output will provide information showing the index bounds being used (if they exist).