Mongodb – Does the MongoDB background flush (MMAP) update the entire document even if only a small portion has changed? i.e. set on an array position

mongodb

We are currently trying to get to the bottom of some high background flush times on our MongoDB installation. As part of this investigation we have been looking at the $set operation we perform on an array embedded in one of our document types. The array contains 348 empty documents to begin with, and over the course of a week these array element will have sub-documents inserted (if empty) and then subsequently updated (if they already exist). The sub-documents are approximately 100 bytes in size and are not indexed.

So my question is, when the document is flushed to disk, what actually gets written, the entire document or just the sub-document that has been updated?

Best Answer

The array contains 348 empty documents to begin with, and over the course of a week these array element will have sub-documents inserted (if empty) and then subsequently updated (if they already exist). The sub-documents are approximately 100 bytes in size and are not indexed.

One consideration with this use case is that your documents are consistently growing. MongoDB used a record allocation or padding strategy to allow documents to grow in-place. For example, if your document starts off as 1000 bytes MongoDB 2.6 or newer will round this up to a 1024 byte record allocation for MMAP (as per the Power of 2 Size default strategy). Updates that don't grow the size of the document beyond the current record allocation are more efficient for the server to execute.

However, if you added 100 bytes to a document which was initially 1000 bytes, the document would have to be moved to a new record allocation in storage (and associated index entries would also have to be updated). So in this example, the next allocation for a 1100 byte document would be 2048 bytes (allowing for ~9 more 100 byte fields to be added before a new record allocation was needed for this document). Indexes in MongoDB include the storage location of the document, so a document move will result in an update for every index entry referencing that document.

You can check the frequency of document moves by looking at the nmoved value for slow updates (or by enabling increased levels of logging / system profiling). Frequent document moves can definitely have a performance impact. Common strategies include either reconsidering the data model (eg. moving the growing portion of the document to a separate collection if appropriate) or adding manual padding to the initial document allocation. The default power of 2 allocation strategy is designed to avoid the need for manual padding in most cases, but if your documents start small and grow quickly you might be able to avoid some initial document moves.

So my question is, when the document is flushed to disk, what actually gets written, the entire document or just the sub-document that has been updated?

The answer will depend on the size of your document and the nature of updates since the last background flush. I'll assume you are using a default configuration with MMAP storage engine and journal enabled.

By default data changes are written twice: once to fast append-only journal files (committed to disk every 100ms) and again to a private view in memory (flushed to data files every 60s). The background flush process is a periodic asynchronous write of all pages that have been "dirtied" in memory since the last flush. Journal commit and background flush intervals can be influenced by both server configuration and write concerns. For a good overview of the process see How MongoDB’s Journaling Works.

The MMAP storage engine will fetch the full document into memory before applying updates. The standard x86 page size is 4KiB so a single document may be represented by one or more pages -- or multiple documents may be part of a single page in memory.

So, if you are updating a single document the writes will include:

all changes written to the journal
all changes written to the oplog (if that node is part of replica set)
any pages dirtied for that document since the last background flush

An important caveat is "since the last background flush". Multiple updates affecting the same pages within a given sync interval will effectively be batched.

If you're trying to get to the bottom of performance issues then consistently high background flush times (particularly as a large or increasing percentage of the default 60s flush interval) are definitely of concern, but should be reviewed in the context of other metrics such as page faults, I/O stats, and lock percentage. I would also review the MongoDB Production Notes for general tips and upgrade to the latest MongoDB production release for your major version (i.e. latest 2.6.x or 3.0.x if there's a newer x than your current version).

Related Solutions

Mongodb – How do databases store index key values (on-disk) for variable length fields

You can store your index as a list of fixed-size offsets into the block containing your key data. For example:

+--------------+
| 3            | number of entries
+--------------+
| 16           | offset of first key data
+--------------+
| 24           | offset of second key data
+--------------+
| 39           | offset of third key data
+--------------+
| key one |
+----------------+
| key number two |
+-----------------------+
| this is the third key |
+-----------------------+

(well, the key data would be sorted in a real example, but you get the idea).

Note that this does not necessarily reflect how index blocks are actually constructed in any database. This is merely an example of how you might organise a block of index data where the key data is of variable length.

Postgresql – Using MongoDB and PostgreSQL together

Some thoughts....

Typically one does not want to store pieces of tightly interrelated information in different systems. The chances of things getting out of sync is significant and now instead of one problem on your hands you have two. One thing you can do with Mongo though is use it to pipeline your data in or data out. My preference is to keep everything in PostgreSQL to the extent this is possible. However, I would note that doing so really requires expert knowledge of PostgreSQL programming and is not for shops unwilling to dedicate to using advanced features. I see a somewhat different set of options than you do. Since my preference is not something I see listed I will give it to you.

You can probably separate your metadata into common data, data required for classes, and document data. In this regard you would have a general catalog table with the basic common information plus one table per class. In this table you would have an hstore, json, or xml field which would store the rest of the data along with columns where you are storing data that must be constrained significantly. This would reduce what you need to put in these tables per class, but would allow you to leverage constraints however you like. The three options have different issues and are worth considering separately:

hstore is relatively limited but also used by a lot of people. It isn't extremely new but it only is a key/value store, and is incapable of nested data structures, unlike json and xml.

json is quite new and doesn't really do a lot right now. This doesn't mean you can't do a lot with it, but you aren't going to do a lot out of the box. If you do you can expect to do a significant amount of programming, probably in plv8js or, if you want to stick with older environments, plperlu or plpython. json is better supported in 9.3 though at least in current development snapshots, so when that version is released things will get better.

xml is the best supported of the three, with the most features, and the longest support history. Then again, it is XML.....

However if you do decide to go with Mongo and PostgreSQL together, note that PostgreSQL supports 2 phase commit meaning you can run the write operations, then issue PREPARE TRANSACTION and if this succeeds do your atomic writes in Mongo. If that succeeds you can then COMMIT in PostgreSQL.

Best Answer

Related Solutions

Mongodb – How do databases store index key values (on-disk) for variable length fields

Postgresql – Using MongoDB and PostgreSQL together

Related Question