Mongodb – Huge size on mongodb’s gridfs. Should I compact

mongodbreplication

I’m running MongoDB and grid.fs to store small files (less than 20mbs). These are part of a replica set. I currently have more than 350000 files stored.

I’ve noticed that the chunks collection takes around 700GB of preallocated space where the actual chunks consist of ~40GB. Even though there are 700GB of data preallocated, this keeps expanding over time.

Keep in mind that every 15 minutes or so I delete files older than 5 days. So in theory my fs.chunks and fs.files size should remain around the same over time.

Here's my fs.chunks stats

rs0:PRIMARY> db.fs.chunks.stats()
{
    "ns" : "collection.fs.chunks",
    "count" : 470388,
    "size" : 43295062144,
    "avgObjSize" : 92041.17057407927,
    "storageSize" : 757794040352,
    "numExtents" : 373,
    "nindexes" : 2,
    "lastExtentSize" : 2146426864,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 40356736,
    "indexSizes" : {
        "_id_" : 17431232,
        "files_id_1_n_1" : 22925504
    },
    "ok" : 1
}

Is this behaviour normal? Can I compact (defrag?) the chunks collection or even claim that preallocated space ? If I cannot reclaim that space (which I’m 99.9% sure I can’t) is there a way to ensure that the preallocated space will be used eventually rather than keeps expanding?

Best Answer

You have few options here.

  1. Run DB command 'Compact' on the collection - This will not reclaim disk space, but will perform defragmentation over the specific collection.
  2. Perform a repairDatabase - This reclaims disk space back to the OS. the repairDatabase runs over the entire DB (or shard) and not just one collection (I would read the documentation, as repaireDatabase both locks the DB it's repairing and requires at least x2 of disk space than your DB size)

Please keep in mind that MongoDB always seeks to allocate more data files.
If indeed as you mentioned, you're deleting a lot of data, Mongo will reuse this space but it might contain 'junk' blocks which makes mongo use less than it actually has allocated. this is why Mongo is allocating more and more data files.

You can read more about it here: Managing disk space in MongoDB

Hope this helps a bit.