Mongodb – How does the MongoDB query optimizer evaluate candidate plans

mongodboptimization

The MongoDB query optimizer processes queries and chooses the most
efficient query plan for a query given the available indexes

This is an excerpt from MongoDB documentation.
I am looking for more information about how the query optimizer evaluates candidate plans.

  • Does the query optimizer use collections statistics ? Indeed, we can
    think that a COLSCAN has not the same cost for a collection with
    1000 docs or a collection with 100 millions docs.

  • When we run explain("allPlansExecution"), we get a lot of information about each evaluated plans. But, among following, what are the selection criteria in order of importance to determine the winning plan ? totalKeysExamined, totalDocsExamined, executionTimeMillis, nReturned, memUsage, other… ?

Best Answer

Does the query optimizer use collections statistics ?

The query optimizer does not (as at MongoDB 4.0) use collection statistics.

Plan evaluation is based on comparing candidate plans for a given query shape to see which returns the first batch of results (by default 101 documents) with the least amount of overall "work". The works score is a proxy for different effort involved in query stages (index key comparisons, fetching documents, etc). If multiple plans perform identical work during evaluation, there are some small tie-breaking bonuses that can help choose a plan to cache (for example, if a plan is a covered query or does not require an in-memory sort).

The winning plan will be cached for future queries matching the same query shape, but will be re-evaluated if performance declines over time or there are other relevant changes (for example, indexes are added/removed). See Query Plans in the MongoDB documentation for a general overview.

For best outcomes on common queries you generally want to have an ideal index for your query shape rather than several competing candidate indexes. An ideal index would include the fields in your query and support using the index to sort query results (if you have sort criteria). For a worked example, the Optimizing MongoDB Compound Indexes blog post is a helpful read (note: this uses explain output from an older version of MongoDB, but the general concepts are still applicable).

If adding an ideal index isn't possible and the query optimizer isn't able to deterministically chose the best index given your data, you could consider using an index hint to override the query optimization process and force a specific index to be used. Index hints should be used sparingly as they can force suboptimal choices and will ignore any future indexes added that might be more suitable.

When we run explain("allPlansExecution"), we get a lot of information about each evaluated plans. But, among following, what are the selection criteria in order of importance to determine the winning plan ? totalKeysExamined, totalDocsExamined, executionTimeMillis, nReturned, memUsage, other... ?

From a query scoring perspective, the works value for each query stage is the key indicator. Metrics like totalKeysExamined and totalDocsExamined are useful for a quick comparison of efficiency for the number of results returned (nReturned). Ideally the ratio of totalKeysExamined to nReturned would be 1:1 (a single index lookup for every result), but some complex queries will necessarily have to examine more values. Other execution metrics will vary based on the query stage, but these can provide insight into calculation of the works value for that stage. For a more detailed walkthrough, I recommend the Reading the .explain() Output presentation from Charlie Swanson.

If you want to better understand the execution stats for a specific query, I suggest posting a new question on DBA StackExchange with your specific MongoDB server version and output of explain("allPlansExecution"). You may also find it helpful to investigate using a UI tool like MongoDB Compass, which includes a Visual Tree of explain output which is easier to explore than raw JSON.