MongoDB Files – What are These Files?

database-tuningmongodb

I was looking into my server and I found this:

[root@host ~]# cd /var/lib/mongo/journal/
[root@host journal]# ls -lh
total 3.1G
-rw-------. 1 mongod mongod 1.0G Apr  7 01:18 j._0
-rw-------. 1 mongod mongod   88 Apr  7 01:18 lsn
-rw-------. 1 mongod mongod 1.0G Dec 19 23:03 prealloc.1
-rw-------. 1 mongod mongod 1.0G Dec 19 23:06 prealloc.2

but in my db, db.stats() shows:

> db.stats()
{
    "db" : "gpstracker",
    "collections" : 5,
    "objects" : 59127,
    "avgObjSize" : 139.84318500854093,
    "dataSize" : 8268508,
    "storageSize" : 11198464,
    "numExtents" : 10,
    "indexes" : 3,
    "indexSize" : 1937712,
    "fileSize" : 201326592,
    "nsSizeMB" : 16,
    "dataFileVersion" : {
        "major" : 4,
        "minor" : 5
    },
    "ok" : 1
}

And I have only one database, with 3 collections in it.

Maybe it is something that I don't know, yet.

I searched a bit and found things about compacting a database, can this be of help for this? Or these files are completely normal and they should be this way?

What if my database grows, say from ~200MB (which is now) to ~1G. How much does these files grow? Can these files (journals) be optimized somehow?

Sorry if my questions seem obvious. I'm a beginner in database department.

Thanks in advance

Best Answer

Those files belong to the journal, which is essentially what is going to guarantee consistency of your data as you write it to the database. It is pre-allocated at 3GB (unless you start with --smallfiles) and will not be included in the size of your database. This will not grow with your database but rather it will be based on how much you are writing to the database. Unless you are doing a lot of writes the journal size will be 3GB. If you are curious about journaling and how it works, there is a great write up here.

As for the database itself, the disk space usage question has been asked and answered many times - yes a regular compact (and repair or resync) can be needed, but it very much depends on what you are doing with the data.