Mongodb – How to influence files (pre)allocation steps (max growing value) in MongoDB 3.4 with WiredTiger

mongodbmongodb-3.4wiredtiger

We have in environment deployed some really small DBs (5GB) as directory per DB.Under dirs is LVM with XFS, where we have implemented logic for "brake" when DB space should be full-filled (all is due to some strict disk quotas – not really interesting at all I think…).
Our solution runs on MongoDB v3.4.19 on CentOS 7.5. Everything looks ok.

Some of DBs has more than 60% used disk space by single collection and in our scenario we encountered that, while restoring 2.3GB dump (all into 5GB database dir/=disk, the preallocation after 2GBs asks OS for another space in (I think) power of 2 strategy (or something very similar).

This situation hits ours triggers, and eventually should lead to lock of DB (which is still better than fall of whole cluster because of no space left).
Basically same situation we have with 10GB DB. (And in some way we can say that similar situation can occur with literally any capacity…)

My point of question is: Am I able to influence steps (maximum size lower than ~2GB) of (pre)allocation growing of Mongo/WT for restoring (only) or at all?

(Performance impact is not highest prioritized in this case.)

I've found some promising possibilities through db.adminCommand and configStrings for storage engines, but I was not able to affect or list some of possible parameters nor change something to turn things to good…
Especially file_extend parameter of WT through

db.adminCommand( { setParameter: '1', wiredTigerEngineRuntimeConfig: { <parameter> } } )

looked promising but I didn't find way to pass it to cluster…
Some of tries ended with error 22 – Invalid argument, but I wasn't able to find more about…

The other found, seemed suitable for MMAP engine only – newCollectionsUsePowerOf2Sizes=false.

This makes restoring of dumps (eventually backups) into cluster much more bothersome…

Hope that issue is clearly explained and any hint/tip would be appreciated…
If some additional info needed, let me know.

Update:
I've done screenshots where (I believe) is clearly visible what I'm talking about above… There are $ ls output (upper left), $ df output (upper right) and $ mongorestore progress (bottom left). (Better once to see than thousand times to hear (read)…)

Beginning
Beginning
First alloc (to 1.4GB – used 650MB)
First_step_up
Second alloc (to 2.4GB – used 1.1GB)
Second_step_up
Third alloc (to 4.4GB – used 2.1GB)
Third_step_up
At the restore end (still allocated 4.4GB – used 2.3GB)
Restore_end
Back to normal (extra space deallocated)
Space_freed

Thank you.

Zdenek

Best Answer

WiredTiger does not preallocate data files. Only journal files are pre-allocated (at 100MB per pre-allocation), see journaling process.

The disk exhaustion you're seeing is likely due to these journal files. Journal files are optimized for quick writes, and their contents will be persisted in a more permanent manner on a WiredTiger checkpoint which occurs every ~60 seconds. Typically, once persisted in data files, you should see less disk used, since now the journal files can be removed. Note that the journal files are only removed by WiredTiger following a checkpoint and at no other times.

I would suggest you to allow for some temporary extra space during this restore/data load process. Once the database is in a steady state, you would see a more accurate size.

The power-of-2 preallocation is specific to MMAPv1 storage engine, which is deprecated. This is not relevant to WiredTiger.