MongoDB replication throttled due to Mongo using all system memory

memorymongodbreplication

We have a small three member replicaset running MongoDB 3.4:

  • Primary. Physical local server, Windows Server 2012, 64 GB RAM, 6 cores. Hosted in Scandinavia.
  • Secondary. Amazon EC2, Windows Server 2016, r4.2xlarge, 61 GB RAM, 8 vCPUs. Hosted in Germany.
  • Arbiter. Tiny cloud based Linux instance.

We have noticed that the replication speed from the primary to the secondary is throttled (capped) at just over 20 MBit/s, i.e. it doesn't go above that speed. If there is more data to replicate, it will queue up and we will have replication lag.

This is not a pure bandwidth issue; for example while throttling is taking place, we can transfer data between the two servers over FTP at far more than 20 MBit/s. Also, sometimes, notably after MongoDB restarts, the throttling stops for a couple of days, then comes back. There is nothing in the logs that describes why.

After a lot of experimentation and debugging, we have found the issue for the throttling: Over time MongoDB will consume all available memory on the primary member. Once this state is reached (typically a few days after a restart), throttling sets in. About half of the memory is the MongoDB process itself (WiredTiger cache), and the remainder is consumed by memory mapped collections. This seems to be by design, according to the MongoDB website: "Via the filesystem cache, MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes."

Now on to the questions:

  • Can anyone think of the reason why low available memory leads to replication throttling?
  • Is there a way in MongoDB to limit the amount of memory used for memory mapping? (We haven't found any.)
  • Is there a recommended way in Windows to do the same? (Feels like the wrong end to solve the problem.)

For now, we have an extremely hacky workaround: We have a small .NET script that continuously monitors the available memory, and whenever it's low, it allocates 10 GB memory (which will be taken from MongoDB), then immediately releases it. The result is that the server has 10 GB available memory, which takes around 1 day for MongoDB to fill up, during which there is no throttling. Yes, this script actually fixes the problem 🙂

Best Answer

This sounds like the bug fixed by WT-2670, where mapped memory was not automatically released when no longer in use. Have you tried upgrading to the latest version?