MongoDB not using wildcard nested array index

mongodbnosqlquery-performance

I have the following collection:

{
   _id: 12345,
   quizzes: [
     { 
        _id: 111111,
        done: true
     }
   ]
},
{
   _id: 78910,
   quizzes: [
     { 
        _id: 22222,
        done: false
     }
   ]
}

I want to select the documents where a certain quiz from the quizzes was done and want to make sure that it uses the appropriate index. So I use the following query:

Answer.find({ 'quizzes.0.done': true }).explain('queryPlanner');

Which returns:

queryPlanner: {
  plannerVersion: 1,
  namespace: 'iquiz.answers',
  indexFilterSet: false,
  parsedQuery: { 'quizzes.0.done': [Object] },
  winningPlan: { stage: 'COLLSCAN', filter: [Object], direction: 'forward' },
  rejectedPlans: []
}

The query is not using any index as seen from the output. I have tried the following indexes and none get used:

{ quizzes.done: 1 }
{ quizzes.[$**].done: 1 }
{ quizzes: 1 }

The only 1 that actually gets used:

{ quizzes.0.done: 1 }

However this is not really practical as I may target any quiz from the quizzes array not just the first one. Is there a certain syntax for the index in my case or this is a current limitation of mongodb? Thanks!

Best Answer

When MongoDB indexes an array, or a field in an array of subdocuments, the index is essentially a list of value -> recordId pairs.

The value part is the actual field value, and does not include any information about the index of the value within the array.

The query executor's index scan matches the queried value with the indexed values and returns a list of recordIds that then need to be fetched.

If you have additional requirements, such as an elemMatch clause or require the value to be in a specific position in the array, the documents will have to be examined after the index scan.

This means that if you are using an index on {"quizzes.done":1} to match "quizzes.0.done": true, the executory would need to retrieve all documents that contain at least 1 quiz that is done, and examine each to determine if it is the first index. In that case, a collection scan is quite likely more efficient.

You noted that the index on {"quizzes.0.done":1} works for that query, but that is because the query executor can guarantee the location of the match within the array with that index, and be certain that no other array element will match.

I'm not certain how the { quizzes.[$**].done: 1 } index might be handled, but I suspect it won't look much different than { quizzes.done: 1 }