MongoDB Query Performance – Does Order of Search Terms Matter?

mongodbquery-performance

Does the order of search terms in MongoDB matter? I came across this documentation from learnmongodbthehardway that cautions against the use of $ne and $nin:

If you need to use these [$nin and $ne] ensure you are filtering down using indexes as much as possible and leaving the $nin and $ne terms to the very last part of the query selector.

I always thought that the order of the search terms does not matter. I haven't produced a rich enough dataset to prove/disprove this.

Best Answer

Lets consider a collection with documents like this:

{
    _id: 1, 
    firstname: "john", 
    lastname: "doe", 
    city: "Chicago",
    // ... other fields 
}

If you are searching the collection:

db.collection.find( { firstname: "john", lastname: "doe" } )
// - or -
db.collection.find( { lastname: "doe", firstname: "john" } )

the output is the same.

Indexes are created on the fields used in the search criteria (or filter) - for a fast search. Suppose, an index is created on the two fields like this:

db.collection.createIndex({ lastname: 1, firstname: 1 } )

After the index creation, both the above queries perform the same way, and use the above index for search in the same way. The order of the fields in the search criteria do not matter (in this case).

This query also uses the above index:

db.collection.find( { lastname: "doe" } )

But, the following query doesn't:

db.collection.find( { firstname: "john" } )

That is, the order in which the fields are specified in the index matter. That is, creating an index with:

{ lastname: 1, firstname: 1 }

is not the same as:

{ firstname: 1, lastname: 1 }

Query Selectivity

The following query using the two fields, e.g.,

db.collection.find( { lastname: "doe", firstname: "john" } )

and the index using both the fields, the order of the fields in the index matters. This is mostly determined by a factor called as query selectivity.

Query selectivity determines that the first field of the index filters a large set of documents, so that the following index fields have least to select from. For example, if there are 1 million documents in the collection, and there are 2000 documents with "doe" as lastname, then the query is selective with the index{ lastname: 1, firstname: 1 }. Suppose, on the same data set and query, there is the index on { firstname: 1, lastname: 1 } and there are 250,000 documents with "john" as firstname, it is not a very selective one (as there needs further search of 250, 000 documents for lastname "doe", and this is not very performant).

In general, the queries with $ne and $nin are considered not very selective.


How to find out if a query is using an index or not, or using the right index?

You can use the explain method on the query, and it generates a query plan for that query. The query plan tells if the query is using an index or not, or if there are multiple indexes which one of them is being used, or no index is being used at all. Also, there are options to see the other information like the amount of time the query takes using the index, etc.

Related Question