I need to execute following query on mongo db:
db.testCollection.find({ dataType: { $exists: true } }, { _id: 0, date: 1 })
.sort({ date: -1 })
.limit(1)
so basically only I need to do is to find newest date from documents that have dataType
field.
I have ~20 milion documents in that collection and this query last something like 4-5 minutes.
I tried to cover all query needs by index, so I created index like this:
db.testCollection.ensureIndex(
{ dataType: 1, date: -1 },
{ partialFilterExpression: { dataType: { $exists: true } } }
)
As far as I know, this index contains all data the query needs:
- information that all documents in this index contains
dataType
field exists date
field value needed to retreived it from querydate
field value needed to sort operation
Unfortunately, mongo still have FETCH
stage in query plan to check if dataType
filed exists:
"winningPlan" : {
"stage" : "PROJECTION_SIMPLE",
"transformBy" : {
"_id" : 0,
"date" : 1
},
"inputStage" : {
"stage" : "SORT",
"sortPattern" : {
"date" : -1
},
"limitAmount" : 1,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"dataType" : {
"$exists" : true
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"dataType" : 1,
"date" : -1
},
"indexName" : "dataType_1_date_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"dataType" : [ ],
"date" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : true,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"dataType" : [
"[MinKey, MaxKey]"
],
"date" : [
"[MaxKey, MinKey]"
]
}
}
}
}
}
}
So my question is why mongo still needs to fetch document from collections?
EDIT: When I changed $exists
operator to $ne: null
it doesn't need to execute FETCH
stage.
Best Answer
I see similar results when tried on MongoDB v4.2.8 - using the posted info.
In general, the
$exists
in a query filter uses the index on that field. For example, if there is a fieldfld1
and you run a query with a filter{ fld1: {$exists: true}}
, the query uses the index.In the query plan output, the in-memory sort ("stage" : "SORT") is because - for a compound index the filter must be an equality condition on the index prefix field (
dataType
) and then only the index will be used in a sort operation using thedate
field. The FETCH is for the SORT operation to happen in the memory (i.e., the index is not being used). So, the query is not covered.For example, a query like this will use the index (and it will be covered):
It is important to note that the query filter
{ dataType: "a" }
uses an equality condition - and,{dataType: {$exists: true}}
does not.See Sort and Non-prefix Subset of an Index