MongoDB data directory space is increasing

mongodb

I have a worry, mongoDB data directory space is increasing very rapidly:

I have a two data files and each one has reached 2GB limit and starts spawning 2 GB file multiple files. Is there any where I can compress the data?

Best Answer

The short answer: No

No, you can't do so easily.

The long answer: As always, it depends

First: The reason for those data files to be created is simple: The mongod preallocates a new datafile as soon as the first document is written in the previously preallocated datafile in order to prevent unnecessary latency if when the space of a new datafile is needed. So you will always have at least 2GB more space allocated then you need as soon as the file size of the data files reaches 2GB. As space is cheap, this is a reasonable procedure.

What can happen is that the data in the data files are fragmented. When a document is deleted, it is marked as such in the data files and the space it occupied is used again. So a new document may be written, either fitting into the formerly occupied space or not. When it doesn't fit into that space, a matching space is searched. If there isn't any, the new document is written to the preallocated data file (as documents are never fragmented) and the preallocation of a new data file is triggered. So this is why it can happen that a new data file is preallocated despite of the fact that there is mathematically enough space in the data files for the new document. There are a few possibilities to reclaim disk space, but all of them require some time and effort.

Option 1: using mongodump

You can create a dump of your data using mongodump, drop the database, delete the data files if necessary and use mongorestore to restore your database. Make sure you use the --oplog parameter, if applicable. So what happens during the restore is that all documents are written in contiguous order, and without datafile fragmentation, which may result in a smaller number of data files! effectively reclaiming some disk space. However, there is no guarantee that disk space will be freed, obviously.

There are a few drawbacks using this option:

  • you need the according disk space to create a full dump
  • without the --oplog option, you will have a point in time snapshot of the data, resulting in data written to the database from the start of the dump until it's completion will be lost
    • Indices aren't dumped, but rebuilt on restore. Hence, a restore may take a Very Long Time.

Option 2: "Abusing" the initial sync mechanism of replica sets

You may use MongoDB's mechanism for initially synching a new member of a replica set to achieve the same result as described above, without the risks. On the other hand, arbitrarily removing a member from the replica set reduces your redundancy. The procedure is documented in the MongoDB docs.

Option 3: (Ab)use repairDatabase

This option may be used to compact the database and reclaim disk space! but it has a lot of caveats as described in the documentation of the repairDatabas command. Don't use it to reclaim disk space, unless you are absolutely positively sure that your database and it's collections are in a working state and every other option can't be taken. If you do, do it at your own risk. You have been warned.