Mongodb what is the cost of array queries

arraymongodb

As far as I can tell, find queries in mongodb involving an array appear very expensive. Take as example the queries given in: https://docs.mongodb.com/manual/tutorial/query-array-of-documents/

db.inventory.insertMany( [
   { item: "journal", instock: [ { warehouse: "A", qty: 5 }, { warehouse: "C", qty: 15 } ] },
   { item: "notebook", instock: [ { warehouse: "C", qty: 5 } ] },
   { item: "paper", instock: [ { warehouse: "A", qty: 60 }, { warehouse: "B", qty: 15 } ] },
   { item: "planner", instock: [ { warehouse: "A", qty: 40 }, { warehouse: "B", qty: 5 } ] },
   { item: "postcard", instock: [ { warehouse: "B", qty: 15 }, { warehouse: "C", qty: 35 } ] }
]);

And find query:

db.inventory.find( { "instock": { warehouse: "A" } } )

The above example can be described as simple as "Find all records connected to warehouse A". But what is the cost of the find query (big o)? Are there any indexing optimizations going on here, or is it basically an exhaustive search with a cost of O(N*K), where N is the number of inventory records, and K is the number of elements in the instock array of each record? and if so, are there any ways of optimizing this to minimize cost, by say indexing?

Best Answer

what is the cost of the find query (big o)?

Unindexed queries are going to involve a collection & document scan, so a search complexity of O(N*K) to iterate all documents and array field values.

Indexed queries use a B-tree search which would have a search complexity of O(log K) for a simple query and appropriate index (where K is the number of elements). MongoDB automatically flags an index as multikey if any indexed field is an array. The multikey flag is used to determine correctness of query bounds when an array field is used in a query with multiple predicates.

db.inventory.find( { "instock": { warehouse: "A" } } )

This query would have no results for the given example data because it is strictly matching the embedded document { warehouse: "A" } (which would not match embedded documents with additional fields like qty).

To achieve the outcome you described, the query should instead use dot notation to search the embedded warehouse values:

db.inventory.find( { "instock.warehouse": "A" } )

Are there any indexing optimizations going on here

You can explain query results with execution stats to view the general work that goes into a given query:

db.inventory.find( { "instock.warehouse": "A" } ).explain('executionStats')

High level metrics include keysExamined (index key comparisons) and docsExamined (documents fetched). There are many other metrics that might be of interest including query stages and index bounds.

Adding an appropriate index to support your query will illustrate the difference:

db.inventory.createIndex({"instock.warehouse": 1})

The examples here are simple array queries. For details on more complex queries involving multikey indexes, see Multikey Index Bounds in the MongoDB manual. Where possible, internal optimisations attempt to minimise the range of values that need to be scanned when a multikey index is used.