The problem is that you make several wrong assumptions. Below you will find some corrections
MongoDB is not optimized for small resources
MongoDB was specifically designed to take a lot of data (recording clickstreams were the first application, iirc). As far as I can read, you have your django app on the same server as MongoDB. The problem here is that a lot of users for your django app would translate in a lot of queries/aggregations/write operations done on the MongoDB side. So django and MongoDB would have a race for resources especially during high load times. Since django is the first in the stack, it will almost always "win", for example requesting RAM which MongoDB now can not request. So it might well happen that MongoDB refuses something because of the lack of resources, the request is cancelled and your system appears to do nothing while really the two parts of your application did their best to answer the request, but failed to do so for the lack of resources.
To be honest: Running MongoDB on an 1GB instance alone would imho be not reasonable. Let alone with a django application. Imho, with this setup, you should at the very least have 4GB of RAM, and it might work until you put real load on it. For comparison: I usually suggest between 32 and 128GB of RAM per node (depending on the data, indices and a few other factors) for machines using SSDs as storage technology. Mind you, that is for MongoDB only – at an according scale of data, of course.
"Working set" does not (only) mean cache
Disclaimer: brutally simplified and terminology might be off
MMAPv1 uses memory mapped files. All the details put aside, this means that a file is treated as an addressable range of memory. So if MongoDB wants to read a certain doc, it uses a memory address and a range it wants to read. That memory address might either be already in RAM or it has to be read from disk. Or, and here is the misconception, from the OSes filesystem cache, which – you guessed it – resides in RAM, though just a different part. (Iirc, what happens in this situation is that the address a pointer refers to is changed). So, not only would MongoDB have the working set in RAM, but it would be the direct cause of quite some part of the filesystem cache. So, we have another part of MongoDB requiring even more RAM than the working set only.
The working set is not the only thing consuming memory
- Actually, the way journaling works, it doubles the RAM required by MongoDB.
- Each connection (and remember each driver basically opens a connection pool) gets 1MB of RAM allocated.
- Operations need some memory. Lets take aggregations as an example. They are capped to 100MB memory consumption – that alone would be 10% of your RAM, 5% of your allocatable memory.
However since you use MMAPv1: Do NOT turn off journaling! It is vital for crash recovery in MMAPv1.
Conclusion
Your machine is vastly underprovisioned in terms of RAM. Even if you have a tight budget, I can not stress the need of putting more RAM into that machine enough. I'd at least put 4GB into that machine (physical, that is, not swap) and see how it goes.
Be aware though, that with this setup, you'll always have django and MongoDB compete for resources the most when you need it the least: when your application has comparatively many concurrent users.
Okay, so after following the clues given by loicmathieu and jstell, and digging it up a little, these are the things I found out about MongoDB using WiredTiger storage engine. I'm putting it here if anyone encountered the same questions.
The memory usage threads that I mentioned, all belonged to 2012-2014, all pre-date WiredTiger and are describing behavior of the original MMAPV1 storage engine which doesn't have a separate cache or support for compression.
The WiredTiger cache settings only controls the size of memory directly used by the WiredTiger storage engine (not the total memory used by mongod). Many other things are potentially taking memory in a MongoDB/WiredTiger configuration, such as the following:
WiredTiger compresses disk storage, but the data in memory are uncompressed.
WiredTiger by default does not fsync the data on each
commit, so the log files are also in RAM which takes its toll on
memory. It's also mentioned that in order to use I/O efficiently,
WiredTiger chunks I/O requests (cache misses) together, that also
seems to take some RAM (In fact dirty pages (pages that has
changed/updated) have a list of updates on them stored in a
Concurrent SkipList).
WiredTiger keeps multiple versions of records in its cache (Multi
Version Concurrency Control, read operations access the last
committed version before their operation).
WiredTiger Keeps checksums of the data in cache.
MongoDB itself consumes memory to handle open connections, aggregations, serverside code and etc.
Considering these facts, relying on show dbs;
was not technically correct, since it only shows the compressed size of the datasets.
The following commands can be used in order to get the full dataset size.
db.getSiblingDB('data_server').stats()
# OR
db.stats()
This results is the following:
{
"db" : "data_server",
"collections" : 11,
"objects" : 266565289,
"avgObjSize" : 224.8413545621088,
"dataSize" : 59934900658, # 60GBs
"storageSize" : 22959984640,
"numExtents" : 0,
"indexes" : 41,
"indexSize" : 7757348864, # 7.7GBs
"ok" : 1
}
So it seems that the actual dataset size + its indexes are taking about 68GBs of that memory.
Considering all these, I guess the memory usage is now pretty expected, good part being it's completely okay to limit the WiredTiger cache size, since it handles I/O operations pretty efficiently (as described above).
There also remains the problem of OOM, to overcome this issue, since we didn't have enough resources to take out mongodb, we lowered the oom_score_adj to prevent OOM from killing important processes for the time being (Meaning we told OOM not to kill our desired processes).
Best Answer
As per MongoDB BOL Here Changed in version 3.4: Values can range from
256MB
to10TB
and can be afloat
. In addition, the default value has also changed.Starting in
3.4
, the WiredTiger internal cache, by default, will use the larger of either:With
WiredTiger
, MongoDB utilizes both the WiredTigerinternal cache
and thefilesystem cache
.Via the
filesystem cache
, MongoDB automatically uses all free memory that is not used by theWiredTiger cache
or by other processes.The storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the
WiredTiger
internal cache. The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. In addition, theoperating system
will use any free RAM to buffer file system blocks and file system cache.To accommodate the additional consumers of RAM, you may have to decrease
WiredTiger
internal cache size.For further your ref WiredTiger Storage Engine and Configuration File Options