Mongodb is using most of disk space for journal and not cleaning itself

mongodb

I am using mongodb on production with journal enabled. I am using mongodb to store application data and also using gridfs to store images. I am using Elastic Block Storage (EBS) with 10GB for each mongodb instances I have (total 3 instances on replica).

When I checked the disk usage then I got surprised that journal folder is using almost all the space. Following is details.

bitnami@ip-172-31-25-9:~/stack/mongodb/data/db$ pwd
/home/bitnami/stack/mongodb/data/db
bitnami@ip-172-31-25-9:~/stack/mongodb/data/db$ du -h *
65M admin.0
16M admin.ns
65M bhs.0
129M    bhs.1
257M    bhs.2
513M    bhs.3
16M bhs.ns
3.1G    journal
64M local.0
1.1G    local.1
16M local.ns
4.0K    mongod.lock
4.0K    _tmp
bitnami@ip-172-31-25-9:~/stack/mongodb/data/db$ df -h journal
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      9.8G  7.0G  2.4G  76% /

However when I did mongodump on my data then I found that exported data size is only 500MB. I.e. most space is bsing used by journal. I understand journal does keep details on each write operation made which causing large file size. However I am surprised is that its not deleting old transaction.

Ideally I beleive after write transaction is flushed to disk then journal should be deleted. Is this journal folder size expected? Or I am missing some configuration? Should I think to increase the disk volume soon?

Please advice.

Best Answer

MongoDB (for MMAP storage engine) will allocate 3 journal files by default at 1GiB each. That's where your journal related space usage is coming from, but it will not grow unless you have a very high insert rate.

You can start with the smallfiles option and reduce the size to 3 x 128MiB if you wish, but be aware that your data files will also be reduced (to 512MiB each) so there will be many more of them and they will need to be allocated more often when adding data.

As for whether to increase your storage, that depends on how much data you intend to add to the database - it will need to allocate new data files to store any data you insert, so it really is dependent on your planned usage as to whether 10GiB is enough or not.

Related Solutions

MongoDB – real disk usage is LOWER than dbStats.storageFile

Whatever you do, do not shut down that mongod process until you back up your data (see below). There are missing files in that database directory, and I suspect they have been manually deleted at the OS level. The data files should not have any gaps in them, ever. In other words you should have files starting at myBase.0 all the way up to myBase.37, there should be no gaps in the numbers.

To explain, if you delete the files using rm or similar at the OS level it will succeed, the OS allows it, but because the mongod process that is running has an open file handle to the files they will not actually be deleted by the operating system until you stop the process.

Here's an example of what the lsof command shows for a normal data file called foo.0:

mongod     5786             adam  mem       REG                9,0   67108864
805306654 /data/db/test0/foo.0

And here is what it looks like when you have manually deleted the file:

mongod     5786             adam   24u      REG                9,0   67108864  
805306654 /data/db/test0/foo.0 (deleted)

From within MongoDB that file still exists and is accessible, I can query, run db.stats() etc. successfully, but if that mongod process is restarted the file will be removed and the data is at that point essentially gone (barring efforts to undelete at the filesystem level).

So, what should you do? Well, the first thing is to make sure you have a copy of the data before shutting down that process and losing it. To do that you have a couple of options:

If this is a node in a replica set (even single), add a new secondary set member and let it sync - that will still succeed and then you will have a fully populated version of the data ready to take over on that secondary. (Note: If this is not a replica set you can't turn it into one without a process restart, and that would delete the data - my recommendation is to always run as a replica set, even a single node for anything in production)
Run mongodump to dump the data out somewhere else before it gets deleted. This won't be fast, and you will need plenty of space, but at least it will give you an easily restorable version of your data

A repair on the database might work, but only if you have enough space to accommodate 2x the data plus index size on that disk. It must be a repair command, not a restart with --repair because the restart would cause the files to be deleted.

Finally, you need to figure out what is deleting these files and stop it - is there a cron job or other process that is automatically deleting large files (the data files will usually be 2GB) over a certain age or similar? I've seen things like that before wipe out MongoDB data files with similar results.

MongoDB Disk Usage with Capped Collections

You have basically allocated space by defining two capped collections, the oplog and the second capped collection in MyDatabase.

Unless you specify the oplogSize, the oplog will be allocated at 5% of free space on the volume containing the MongoDB data (so I am guessing you had ~170GB free).

The second capped collection size would have been defined by you, and then the _id index would be created and grown in addition to the initial allocation as you added data. The files also contain the data from the other collections, as you mentioned.

The key thing to remember here, is that capped collections pre-allocate the entire amount specified in advance and never grow or shrink. Besides the indexes that will be added on user defined capped collections (the oplog has no indexes), the capped collection will remain the same size regardless of how much data you put into it, whether that is 500MB or 500TB - think of it as a fixed-size circular buffer.

If you want to see this in action, try creating another capped collection of, say, 500MB in a different database called foo, you will end up with foo.0 through foo.4 (possibly foo.5 depending on the version) and have 960MB (or 1984MB, again depending on version) of files on disk. That represents the minimum needed to contain a capped collection of that size.

Best Answer

Related Solutions

MongoDB – real disk usage is LOWER than dbStats.storageFile

MongoDB Disk Usage with Capped Collections

Related Question