MongoDB – real disk usage is LOWER than dbStats.storageFile

disk-spacemongodb

I'm a bit surprised by what I observe in my mongo server, so I would like to know if anybody understand how this is possible.

I have one collection of around 15K documents. For each of them, I store in GridFS 3 files, with an average size of 1MB per file. So the expected storageSize should be something near 60GB (if considering the size of the document itself to be 1MB too). I can't test all my data by hand, but I really want to be sure everything is here and that there is no missing data for some of the documents.

Here is what I get :

> db.stats();
{
    "db" : "myBase",
    "collections" : 6,
    "objects" : 348822,
    "avgObjSize" : 195475.27135329766,
    "dataSize" : 68186075104,
    "storageSize" : 69329817136,
    "numExtents" : 73,
    "indexes" : 14,
    "indexSize" : 42171808,
    "fileSize" : 67108864,
    "nsSizeMB" : 16,
    "dataFileVersion" : {
        "major" : 4,
        "minor" : 5
    },
    "ok" : 1
}

so storageFile ~ 65GB, this is acceptable.

But when I look at the file system

$ ps -edf | grep mongo
user 20728     1  0  2014 ?        05:31:03 /opt/tools/bin/mongod --dbpath=/local/mongo-data

$ cd /local/mongo-data
$ du -h
 4.0K   ./_tmp
 721M   ./journal
 8.8G   .
$ du -h *
 721M   journal
 65M    myBase.0
 2.0G   myBase.37
 2.0G   myBase.5
 2.0G   myBase.7
 2.0G   myBase.8
 17M    myBase.ns
 4.0K   _tmp

How come the data on disk takes LESS space that what it says in mongo ?
Why doesn't appear anywhere a stat with a value of 8GB ?


I add stats from files and chunks if it can help understand what is happening

> db.fs.files.stats();
{
    "ns" : "myBase.fs.files",
    "count" : 48911,
    "size" : 14316208,
    "avgObjSize" : 292.69914743104823,
    "storageSize" : 18898944,
    "numExtents" : 7,
    "nindexes" : 2,
    "lastExtentSize" : 7647232,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 7873488,
    "indexSizes" : {
        "_id_" : 2076704,
        "filename_1_uploadDate_1" : 5796784
    },
    "ok" : 1
}

> db.fs.chunks.stats();
{
    "ns" : "myBase.fs.chunks",
    "count" : 283537,
    "size" : 67787872056,
    "avgObjSize" : 239079.45719958947,
    "storageSize" : 68819078704,
    "numExtents" : 48,
    "nindexes" : 2,
    "lastExtentSize" : 2146426864,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 27324192,
    "indexSizes" : {
        "_id_" : 11781616,
        "files_id_1_n_1" : 15542576
    },
    "ok" : 1
}

Edit : I add here the result of the command asked by Antonis

> db.fs.chunks.find({},{_id:1}).itcount()
283537

$ du -ah .
17M ./myBase.ns
2.0G    ./myBase.37
2.0G    ./myBase.7
2.0G    ./myBase.5
4.0K    ./_tmp
65M ./myBase.0
4.0K    ./journal/lsn
724M    ./journal/j._19
724M    ./journal
2.0G    ./myBase.8
8.8G    .

Best Answer

Whatever you do, do not shut down that mongod process until you back up your data (see below). There are missing files in that database directory, and I suspect they have been manually deleted at the OS level. The data files should not have any gaps in them, ever. In other words you should have files starting at myBase.0 all the way up to myBase.37, there should be no gaps in the numbers.

To explain, if you delete the files using rm or similar at the OS level it will succeed, the OS allows it, but because the mongod process that is running has an open file handle to the files they will not actually be deleted by the operating system until you stop the process.

Here's an example of what the lsof command shows for a normal data file called foo.0:

mongod     5786             adam  mem       REG                9,0   67108864
805306654 /data/db/test0/foo.0

And here is what it looks like when you have manually deleted the file:

mongod     5786             adam   24u      REG                9,0   67108864  
805306654 /data/db/test0/foo.0 (deleted)

From within MongoDB that file still exists and is accessible, I can query, run db.stats() etc. successfully, but if that mongod process is restarted the file will be removed and the data is at that point essentially gone (barring efforts to undelete at the filesystem level).

So, what should you do? Well, the first thing is to make sure you have a copy of the data before shutting down that process and losing it. To do that you have a couple of options:

  1. If this is a node in a replica set (even single), add a new secondary set member and let it sync - that will still succeed and then you will have a fully populated version of the data ready to take over on that secondary. (Note: If this is not a replica set you can't turn it into one without a process restart, and that would delete the data - my recommendation is to always run as a replica set, even a single node for anything in production)
  2. Run mongodump to dump the data out somewhere else before it gets deleted. This won't be fast, and you will need plenty of space, but at least it will give you an easily restorable version of your data

A repair on the database might work, but only if you have enough space to accommodate 2x the data plus index size on that disk. It must be a repair command, not a restart with --repair because the restart would cause the files to be deleted.

Finally, you need to figure out what is deleting these files and stop it - is there a cron job or other process that is automatically deleting large files (the data files will usually be 2GB) over a certain age or similar? I've seen things like that before wipe out MongoDB data files with similar results.