MongoDB terminates when it runs out of memory

dockerlimitsmemorymongodb

I have the following configuration:

a host machine that runs three docker containers:
- MongoDB
- Redis
- A program using the previous two containers to store data

Both Redis and MongoDB are used to store huge amounts of data. I know Redis needs to keep all its data in RAM and I am fine with this. Unfortunately, what happens is that mongo starts taking up a lot of RAM and as soon as the host RAM is full (we're talking about 32GB here), either mongo or Redis crashes.

I have read the following previous questions about this:

Limit MongoDB RAM Usage: apparently most RAM is used up by the WiredTiger cache
MongoDB limit memory: here apparently the problem was log data
Limit the RAM memory usage in MongoDB: here they suggest to limit mongo's memory so that it uses a smaller amount of memory for its cache/logs/data
MongoDB using too much memory: here they say it's WiredTiger caching system which tends to use as much RAM as possible to provide faster access. They also state it's completely okay to limit the WiredTiger cache size, since it handles I/O operations pretty efficiently
Is there any option to limit mongodb memory usage?: caching again, they also add MongoDB uses the LRU (Least Recently Used) cache algorithm to determine which "pages" to release, you will find some more information in these two questions
MongoDB index/RAM relationship: quote: MongoDB keeps what it can of the indexes in RAM. They'll be swaped out on an LRU basis. You'll often see documentation that suggests you should keep your "working set" in memory: if the portions of index you're actually accessing fit in memory, you'll be fine.
how to release the caching which is used by MongoDB?: same answer as in 5.

Now what I appear to understand from all these answers is that:

For faster access it would be better for mongo to fit all indices in RAM. However, in my case, I am fine with indices partially residing on disk as I have a quite fast SSD.
RAM is mostly used for caching by mongo.

Considering this, I was expecting mongo to try and use as much RAM space as possible but being able to function also with few RAM space and fetching most things from disk. However, I limited mongo Docker container's memory (to 8GB for instance), by using --memory and --memory-swap, but instead of fetching stuff from disk, mongo just crashed as soon as it ran out of memory.

How can I force mongo to use only the available memory and to fetch from disk everything that does not fit into memory?

Best Answer

As per MongoDB BOL Here Changed in version 3.4: Values can range from 256MB to 10TB and can be a float. In addition, the default value has also changed.

Starting in 3.4, the WiredTiger internal cache, by default, will use the larger of either:

50% of RAM minus 1 GB, or
256 MB.

With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.

Via the filesystem cache, MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes.

The storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the WiredTiger internal cache. The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.

To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.

For further your ref WiredTiger Storage Engine and Configuration File Options

**MongoDB is not optimized for small resources**

MongoDB was specifically designed to take a lot of data (recording clickstreams were the first application, iirc). As far as I can read, you have your django app on the same server as MongoDB. The problem here is that a lot of users for your django app would translate in a lot of queries/aggregations/write operations done on the MongoDB side. So django and MongoDB would have a race for resources especially during high load times. Since django is the first in the stack, it will almost always "win", for example requesting RAM which MongoDB now can not request. So it might well happen that MongoDB refuses something because of the lack of resources, the request is cancelled and your system appears to do nothing while really the two parts of your application did their best to answer the request, but failed to do so for the lack of resources.

To be honest: Running MongoDB on an 1GB instance alone would imho be not reasonable. Let alone with a django application. Imho, with this setup, you should at the very least have 4GB of RAM, and it might work until you put real load on it. For comparison: I usually suggest between 32 and 128GB of RAM per node (depending on the data, indices and a few other factors) for machines using SSDs as storage technology. Mind you, that is for MongoDB only – at an according scale of data, of course.

**"Working set" does not (only) mean cache**

Disclaimer: brutally simplified and terminology might be off

MMAPv1 uses memory mapped files. All the details put aside, this means that a file is treated as an addressable range of memory. So if MongoDB wants to read a certain doc, it uses a memory address and a range it wants to read. That memory address might either be already in RAM or it has to be read from disk. Or, and here is the misconception, from the OSes filesystem cache, which – you guessed it – resides in RAM, though just a different part. (Iirc, what happens in this situation is that the address a pointer refers to is changed). So, not only would MongoDB have the working set in RAM, but it would be the direct cause of quite some part of the filesystem cache. So, we have another part of MongoDB requiring even more RAM than the working set only.

**The working set is not the only thing consuming memory**

Actually, the way journaling works, it doubles the RAM required by MongoDB.
Each connection (and remember each driver basically opens a connection pool) gets 1MB of RAM allocated.
Operations need some memory. Lets take aggregations as an example. They are capped to 100MB memory consumption – that alone would be 10% of your RAM, 5% of your allocatable memory.

However since you use MMAPv1: Do NOT turn off journaling! It is vital for crash recovery in MMAPv1.

Conclusion

Your machine is vastly underprovisioned in terms of RAM. Even if you have a tight budget, I can not stress the need of putting more RAM into that machine enough. I'd at least put 4GB into that machine (physical, that is, not swap) and see how it goes.

Be aware though, that with this setup, you'll always have django and MongoDB compete for resources the most when you need it the least: when your application has comparatively many concurrent users.

MongoDB using too much memory

Okay, so after following the clues given by loicmathieu and jstell, and digging it up a little, these are the things I found out about MongoDB using WiredTiger storage engine. I'm putting it here if anyone encountered the same questions.

The memory usage threads that I mentioned, all belonged to 2012-2014, all pre-date WiredTiger and are describing behavior of the original MMAPV1 storage engine which doesn't have a separate cache or support for compression.

The WiredTiger cache settings only controls the size of memory directly used by the WiredTiger storage engine (not the total memory used by mongod). Many other things are potentially taking memory in a MongoDB/WiredTiger configuration, such as the following:

WiredTiger compresses disk storage, but the data in memory are uncompressed.
WiredTiger by default does not fsync the data on each commit, so the log files are also in RAM which takes its toll on memory. It's also mentioned that in order to use I/O efficiently, WiredTiger chunks I/O requests (cache misses) together, that also seems to take some RAM (In fact dirty pages (pages that has changed/updated) have a list of updates on them stored in a Concurrent SkipList).
WiredTiger keeps multiple versions of records in its cache (Multi Version Concurrency Control, read operations access the last committed version before their operation).
WiredTiger Keeps checksums of the data in cache.
MongoDB itself consumes memory to handle open connections, aggregations, serverside code and etc.

Considering these facts, relying on show dbs; was not technically correct, since it only shows the compressed size of the datasets.

The following commands can be used in order to get the full dataset size.

db.getSiblingDB('data_server').stats()
# OR
db.stats()

This results is the following:

{
    "db" : "data_server",
    "collections" : 11,
    "objects" : 266565289,
    "avgObjSize" : 224.8413545621088,
    "dataSize" : 59934900658, # 60GBs
    "storageSize" : 22959984640,
    "numExtents" : 0,
    "indexes" : 41,
    "indexSize" : 7757348864, # 7.7GBs
    "ok" : 1
}

So it seems that the actual dataset size + its indexes are taking about 68GBs of that memory.

Considering all these, I guess the memory usage is now pretty expected, good part being it's completely okay to limit the WiredTiger cache size, since it handles I/O operations pretty efficiently (as described above).

There also remains the problem of OOM, to overcome this issue, since we didn't have enough resources to take out mongodb, we lowered the oom_score_adj to prevent OOM from killing important processes for the time being (Meaning we told OOM not to kill our desired processes).

Best Answer

Related Solutions

Mongodb – Memory usage of MongoDB

MongoDB is not optimized for small resources

"Working set" does not (only) mean cache

The working set is not the only thing consuming memory

Conclusion

MongoDB using too much memory

Related Question

**MongoDB is not optimized for small resources**

**"Working set" does not (only) mean cache**

**The working set is not the only thing consuming memory**