Mongodb – Does the MongoDB background flush (MMAP) update the entire document even if only a small portion has changed? i.e. set on an array position

mongodb

We are currently trying to get to the bottom of some high background flush times on our MongoDB installation. As part of this investigation we have been looking at the $set operation we perform on an array embedded in one of our document types. The array contains 348 empty documents to begin with, and over the course of a week these array element will have sub-documents inserted (if empty) and then subsequently updated (if they already exist). The sub-documents are approximately 100 bytes in size and are not indexed.

So my question is, when the document is flushed to disk, what actually gets written, the entire document or just the sub-document that has been updated?

Best Answer

The array contains 348 empty documents to begin with, and over the course of a week these array element will have sub-documents inserted (if empty) and then subsequently updated (if they already exist). The sub-documents are approximately 100 bytes in size and are not indexed.

One consideration with this use case is that your documents are consistently growing. MongoDB used a record allocation or padding strategy to allow documents to grow in-place. For example, if your document starts off as 1000 bytes MongoDB 2.6 or newer will round this up to a 1024 byte record allocation for MMAP (as per the Power of 2 Size default strategy). Updates that don't grow the size of the document beyond the current record allocation are more efficient for the server to execute.

However, if you added 100 bytes to a document which was initially 1000 bytes, the document would have to be moved to a new record allocation in storage (and associated index entries would also have to be updated). So in this example, the next allocation for a 1100 byte document would be 2048 bytes (allowing for ~9 more 100 byte fields to be added before a new record allocation was needed for this document). Indexes in MongoDB include the storage location of the document, so a document move will result in an update for every index entry referencing that document.

You can check the frequency of document moves by looking at the nmoved value for slow updates (or by enabling increased levels of logging / system profiling). Frequent document moves can definitely have a performance impact. Common strategies include either reconsidering the data model (eg. moving the growing portion of the document to a separate collection if appropriate) or adding manual padding to the initial document allocation. The default power of 2 allocation strategy is designed to avoid the need for manual padding in most cases, but if your documents start small and grow quickly you might be able to avoid some initial document moves.

So my question is, when the document is flushed to disk, what actually gets written, the entire document or just the sub-document that has been updated?

The answer will depend on the size of your document and the nature of updates since the last background flush. I'll assume you are using a default configuration with MMAP storage engine and journal enabled.

By default data changes are written twice: once to fast append-only journal files (committed to disk every 100ms) and again to a private view in memory (flushed to data files every 60s). The background flush process is a periodic asynchronous write of all pages that have been "dirtied" in memory since the last flush. Journal commit and background flush intervals can be influenced by both server configuration and write concerns. For a good overview of the process see How MongoDB’s Journaling Works.

The MMAP storage engine will fetch the full document into memory before applying updates. The standard x86 page size is 4KiB so a single document may be represented by one or more pages -- or multiple documents may be part of a single page in memory.

So, if you are updating a single document the writes will include:

  • all changes written to the journal
  • all changes written to the oplog (if that node is part of replica set)
  • any pages dirtied for that document since the last background flush

An important caveat is "since the last background flush". Multiple updates affecting the same pages within a given sync interval will effectively be batched.

If you're trying to get to the bottom of performance issues then consistently high background flush times (particularly as a large or increasing percentage of the default 60s flush interval) are definitely of concern, but should be reviewed in the context of other metrics such as page faults, I/O stats, and lock percentage. I would also review the MongoDB Production Notes for general tips and upgrade to the latest MongoDB production release for your major version (i.e. latest 2.6.x or 3.0.x if there's a newer x than your current version).