MongoDB covered query needs to fetch and examine documents when exists operator is used

indexmongodb

I need to execute following query on mongo db:

db.testCollection.find({ dataType: { $exists: true } }, { _id: 0, date: 1 })
   .sort({ date: -1 })
   .limit(1)

so basically only I need to do is to find newest date from documents that have dataType field.
I have ~20 milion documents in that collection and this query last something like 4-5 minutes.
I tried to cover all query needs by index, so I created index like this:

db.testCollection.ensureIndex(
   { dataType: 1, date: -1 }, 
   { partialFilterExpression: { dataType: { $exists: true } } }
)

As far as I know, this index contains all data the query needs:

  • information that all documents in this index contains dataType field exists
  • date field value needed to retreived it from query
  • date field value needed to sort operation

Unfortunately, mongo still have FETCH stage in query plan to check if dataType filed exists:

"winningPlan" : {
            "stage" : "PROJECTION_SIMPLE",
            "transformBy" : {
                "_id" : 0,
                "date" : 1
            },
            "inputStage" : {
                "stage" : "SORT",
                "sortPattern" : {
                    "date" : -1
                },
                "limitAmount" : 1,
                "inputStage" : {
                    "stage" : "SORT_KEY_GENERATOR",
                    "inputStage" : {
                        "stage" : "FETCH",
                        "filter" : {
                            "dataType" : {
                                "$exists" : true
                            }
                        },
                        "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "dataType" : 1,
                                "date" : -1
                            },
                            "indexName" : "dataType_1_date_-1",
                            "isMultiKey" : false,
                            "multiKeyPaths" : {
                                "dataType" : [ ],
                                "date" : [ ]
                            },
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : true,
                            "indexVersion" : 2,
                            "direction" : "forward",
                            "indexBounds" : {
                                "dataType" : [
                                    "[MinKey, MaxKey]"
                                ],
                                "date" : [
                                    "[MaxKey, MinKey]"
                                ]
                            }
                        }
                    }
                }
            }
        }

So my question is why mongo still needs to fetch document from collections?

EDIT: When I changed $exists operator to $ne: null it doesn't need to execute FETCH stage.

Best Answer

I see similar results when tried on MongoDB v4.2.8 - using the posted info.

In general, the $exists in a query filter uses the index on that field. For example, if there is a field fld1 and you run a query with a filter { fld1: {$exists: true}}, the query uses the index.

In the query plan output, the in-memory sort ("stage" : "SORT") is because - for a compound index the filter must be an equality condition on the index prefix field (dataType) and then only the index will be used in a sort operation using the date field. The FETCH is for the SORT operation to happen in the memory (i.e., the index is not being used). So, the query is not covered.

For example, a query like this will use the index (and it will be covered):

db.test.find({ dataType: "a" }, { date: 1, _id: 0 }).sort({ date: -1 })

It is important to note that the query filter { dataType: "a" } uses an equality condition - and, {dataType: {$exists: true}} does not.

See Sort and Non-prefix Subset of an Index