The MongoDB query optimizer processes queries and chooses the most
efficient query plan for a query given the available indexes
This is an excerpt from MongoDB documentation.
I am looking for more information about how the query optimizer evaluates candidate plans.
-
Does the query optimizer use collections statistics ? Indeed, we can
think that a COLSCAN has not the same cost for a collection with
1000 docs or a collection with 100 millions docs. -
When we run
explain("allPlansExecution")
, we get a lot of information about each evaluated plans. But, among following, what are the selection criteria in order of importance to determine the winning plan ? totalKeysExamined, totalDocsExamined, executionTimeMillis, nReturned, memUsage, other… ?
Best Answer
The query optimizer does not (as at MongoDB 4.0) use collection statistics.
Plan evaluation is based on comparing candidate plans for a given query shape to see which returns the first batch of results (by default 101 documents) with the least amount of overall "work". The
works
score is a proxy for different effort involved in query stages (index key comparisons, fetching documents, etc). If multiple plans perform identical work during evaluation, there are some small tie-breaking bonuses that can help choose a plan to cache (for example, if a plan is a covered query or does not require an in-memory sort).The winning plan will be cached for future queries matching the same query shape, but will be re-evaluated if performance declines over time or there are other relevant changes (for example, indexes are added/removed). See Query Plans in the MongoDB documentation for a general overview.
For best outcomes on common queries you generally want to have an ideal index for your query shape rather than several competing candidate indexes. An ideal index would include the fields in your query and support using the index to sort query results (if you have sort criteria). For a worked example, the Optimizing MongoDB Compound Indexes blog post is a helpful read (note: this uses explain output from an older version of MongoDB, but the general concepts are still applicable).
If adding an ideal index isn't possible and the query optimizer isn't able to deterministically chose the best index given your data, you could consider using an index hint to override the query optimization process and force a specific index to be used. Index hints should be used sparingly as they can force suboptimal choices and will ignore any future indexes added that might be more suitable.
From a query scoring perspective, the
works
value for each query stage is the key indicator. Metrics liketotalKeysExamined
andtotalDocsExamined
are useful for a quick comparison of efficiency for the number of results returned (nReturned
). Ideally the ratio oftotalKeysExamined
tonReturned
would be 1:1 (a single index lookup for every result), but some complex queries will necessarily have to examine more values. Other execution metrics will vary based on the query stage, but these can provide insight into calculation of theworks
value for that stage. For a more detailed walkthrough, I recommend the Reading the .explain() Output presentation from Charlie Swanson.If you want to better understand the execution stats for a specific query, I suggest posting a new question on DBA StackExchange with your specific MongoDB server version and output of
explain("allPlansExecution")
. You may also find it helpful to investigate using a UI tool like MongoDB Compass, which includes a Visual Tree of explain output which is easier to explore than raw JSON.