Mongodb – Why mongo is choosing the wrong index / execution plan

mongodb

In production mongod logfile, we can see a lot of slow queries. Most of them are not using the best index and consequently the best execution plan is not used. However, when I run the same queries by myself in mongo shell, the correct index is used. So why for same query, we don't have same execution plan ?

Mongodb version : 4.0.2 (standalone)

Excerpt of mongod logfile :

2018-12-28T13:55:28.282+0100 I COMMAND [conn1032115] command mydb.products command: find { find: "products", filter: { origins_tags: "italia" }, sort: { last_modified_t: -1 }, skip: 320, limit: 20, singleBatch: false, maxTimeMS: 0, tailable: false, noCursorTimeout: false, awaitData: false, allowPartialResults: false, $readPreference: { mode: "secondaryPreferred" }, $db: "mydb" } planSummary: IXSCAN { last_modified_t: -1 } keysExamined:721542 docsExamined:721542 cursorExhausted:1 numYields:5637 nreturned:3 reslen:23886 locks:{ Global: { acquireCount: { r: 5638 } }, Database: { acquireCount: { r: 5638 } }, Collection: { acquireCount: { r: 5638 } } } protocol:op_query 5166ms

We can see following information :

planSummary: IXSCAN { last_modified_t: -1 }
keysExamined:721542 docsExamined:721542 (basically, the whole collection is examined)
nreturned:3
duration: 5166ms

However, these indexes are existing in collection :

db.products.createIndex({"origins_tags": 1,"sortkey": -1}, { background: true })
db.products.createIndex({"last_modified_t": -1}, { background: true })

Here is the optimized execution plan (the expected one):

planSummary: IXSCAN {"origins_tags": 1,"sortkey": -1}
keysExamined:331 docsExamined: 331
nreturned:3
duration: 11ms

So we can see a huge difference !

Best Answer

these indexes are existing in collection :

db.products.createIndex({"origins_tags": 1,"sortkey": -1}, { background: true })
db.products.createIndex({"last_modified_t": -1}, { background: true })

The MongoDB query optimizer chooses the most efficient query plan based on the query shape (combination of query predicate, sort, and projection) and candidate indexes. If there are multiple candidate plans for a given query shape, a trial evaluation will run to determine which index returns the initial batch of results (101 documents) with the least amount of "work" measured. The winning plan will be cached and periodically reevaluated if query performance or other circumstances change (for example, adding or dropping indexes).

Since neither index is ideal for your query, index selection may vary depending on which index returns the initial batch of results faster during plan evaluation:

The first index is more selective given origins_tags as filter criteria, but it requires a document fetch and in-memory sort (a blocking query stage with a 32MB limit) in order to return results ordered by last_modified. The work required to use this index for the given query will depend on the total number of matching documents: every matching document has to be fetched for the sort. The worst case scenario would be a query that needs to perform an in-memory sort of more than 32MB of data and results in an exception.
The second index returns results in the desired sort order but requires a collection scan to filter by origins_tags. The work required to use this index depends on how quickly the matches are found: this index can stream matches as they are found and stop as soon as there are enough matches. The worst case scenario would be a full collection scan to confirm there are no further matching documents.

If evaluating the two plans result in a tie (both appear to perform the same work to return initial results for a given query shape), the plan that doesn't require an in-memory sort will win.

You can see details of plan evaluation by explaining a query with allPlansExecution mode: db.products.find({...}).explain('allPlansExecution').

keysExamined:721542 docsExamined:721542 (basically, the whole collection is examined)

This is the worst case scenario for the second index: there are 323 matches out of 721,542 documents and you've asked for results 320-340 (via skip: 20, limit: 20).

I don't understand how the second index can give faster results during plan evaluation

Plan evaluation only considers which candidate plan returns the initial batch of results (101 documents) with less overall effort. The query planner does not run the plans to completion during evaluation, or keep any metrics on the distribution of key values. The cached query plan would have been based on a comparison where the second index did not not have to perform the collection scan.

When mongo choose a bad plan, it should be able to see that the execution time is not good and it should not choose same plan for next queries.

Cached plans are re-evaluated if the performance declines, as per the Query Plans flow chart in the MongoDB documentation. However, if you have multiple candidate plans without a deterministic winner, it is possible for a query plan to be chosen that is faster for initial results (or the same query shape with different values), but suboptimal for later results.

How can I fix this?

To fix this issue you should add a compound index supporting your filter & sort criteria:

db.products.createIndex({"origins_tags": 1,"last_modified_t": -1}, { background: true })

Alternatively (and less ideally) you could provide an index hint to force the query to use the first index. Note that hinting will ignore any future indexes added that might be more ideal, and will also fail with an exception if the in-memory sort needs to work with more than 32MB of data.

Adding the suggested compound index would result in the most efficient ratio of keys and docs examined to the number of results returned.

Related Solutions

Mongodb – Why is Mongo stuck in STARTUP2

The answer of eoinbrazil is partly incorrect. A new Node can be in STARTUP2 for a long time. The link the posted says:

Each member of a replica set enters the STARTUP2 state as soon as mongod finishes loading that member’s configuration, at which time it becomes an active member of the replica set. The member then decides whether or not to undertake an initial sync. If a member begins an initial sync, the member remains in STARTUP2 until all data is copied and all indexes are built. Afterwards, the member transitions to RECOVERING.

I'm administering a 700 GB collection and, when I add a new node the STARTUP2 state remains well over 24 hours. But you can still see if there is something happening, by watching if the database grows. You can see the size of the database on the new node with

show databases

or you can also observe the data directory, to see if it is still growing. (on linux with the commands ls, df, du, iotop, etc ....)

Mongodb – Why mongo write only one partition

There is no inconsistency here, this is simply due to how you have the volumes mounted and where you have located you MongoDB database path. You have one logical volume mounted on root (/) and the other mounted on /home - I am guessing you do not have your database path in MongoDB set to something under /home instead it is set to something else which is under the root path.

The most common database paths for MongoDB are /data/db (the default) and /var/lib/mongodb (default on Ubuntu for example), but you can set it to whatever you want. Given how you have your mount points configured, both of those common defaults would go under the root mount point and hence onto the /dev/mapper/vg_db1-lv_root volume. If you switch your database path to be under /home or you re-do your mount points appropriately the data will then be placed on the second volume.

Best Answer

How can I fix this?

Related Solutions

Mongodb – Why is Mongo stuck in STARTUP2

Mongodb – Why mongo write only one partition

Related Question