Mongodb – Configure mongo for one shot full db read

mongodbread-only-database

I have a kubernetes cluster on bare metal with 3 machines having 8Gb of ram each. All my apps and a mongodb 4.0.9 replicaset runs on it.
There is an import program that:

  1. Download a 8Gb mongo dump from external source and restore it in fresly created database A. There is one collection whitout indexes.
  2. Fully browse the restored collection with one find({}) and 1M next().
  3. Emit amqp messages for each doc, then messages are stored in mongo database B, C, D.
  4. Drop the database A.

Part 2 use a lot of ram.

How should I configure mongo database A (and let database B normal) to reduce at max the footprint of the import operation? (Because more important tasks are running in the cluster)

For example, can I configure mongo to not create cache for the temporary collection?

Best Answer

Starting in MongoDB 3.4, the default WiredTiger internal cache size is the larger of either:

  • 50% of (RAM - 1 GB),
  • or 256 MB.

For example, In your case from the 8GB of RAM the WiredTiger cache will use 3.5GB of RAM (0.5 * (8 GB - 1 GB) = 3.5 GB).

This can be reduced to for example to 1GB by using the following admin command only when the mongorestore operation happens.

db.adminCommand( { "setParameter": 1, "wiredTigerEngineRuntimeConfig": "cache_size=1G"})

and revert this back to 3.5 GB once the restoration is completed.

Note:

By reducing the wired tiger cache size will increase the time to restore the data.