MongoDB – Performance of Overwriting Array Field vs Updating Items

mongodbnosql

The problem I am facing is basically updating the inventory when an order is received in a MongoDB database.The general hierarchy is as follows

Manufacturer: {
    ...
    Items: [
             ...
             Item: {
                  ...
                  inventory
             }
           ]
}

Note: I will be using Scala driver syntax from here on

There are two ways to solve this problem:

1) Find() the corresponding Items array, update all the necessary items in array (in Scala), update the entire Items field (single query)

 set("manufacturer.$.items", updatedItems)

2) Individually update each item directly in the DB individually (separate queries)

for(item <- order){
   MongoDataBaseConnector.manufacturers.findOneAndUpdate(
    and(
      equal("_id", manufacturer._id),
      equal("items._id", item._id)
    ),
    dec("tours.$.items.$.item.$.inventory")
  )
}

I am curios to which of these approaches are better in theory. Here are the downsides to both I can think of

1) The list of items can be significantly larger (up to the allowed document size) compared to the number of items in the order (i.e. 10000 items in total, 1 item in order, we would basically re-write 10000 items for the sake of updating one, unless Mongo has some internal optimization for cases like this

2) Well in this case, the opposite, if the |order| ~= |items|than we end up making way too many queries to the DB with the equal amount of writes as approach #1.

Is there an absolute better ?

Best Answer

Both of the described approaches are very suboptimal in theory. My strong recommendation would be to design a more efficient schema to suit your data access & update patterns.

For example, order items could be referenced in a separate collection rather than being embedded in an array. If you are currently using a single order document to ensure atomic updates, you could instead use multi-document transactions (in MongoDB 4.0 and newer) to ensure atomic updates across collections.

Large arrays are a performance anti-pattern (particularly if indexed). Large documents will add I/O overhead: irrespective of what you are sending over the network from your application, the MongoDB server has to load a version of the full document into memory to apply changes. In modern versions of MongoDB (3.2+) the default storage engine, WiredTiger, uses Multiversion concurrency control (MVCC) to enable document-level concurrency. Sending hundreds or thousands of consecutive updates for a single document is going to add unnecessary cache pressure and contention.

To compare performance outcomes you'll have to test with representative data and deployment resources. It is not clear which of your scenarios is more likely or how many individual updates would be required, but if you have to choose the lesser of two evils I expect a single update will be more efficient. You could perhaps improve on the second scenario by sending a single update with individual fields set, but this may not be straightforward with your current schema (and will not address the implicit server I/O inefficiency).

For some schema design considerations, the three part blog series 6 Rules of Thumb for MongoDB Schema Design might be a helpful read for One-to-N Relationship modelling. The Data Models section of the MongoDB documentation also includes some concepts and example patterns.