Mongodb – Understanding document size in MongoDB

mongodb

My collection.stats():

{
    "ns" : "csdbprotobuf.archive",
    "count" : 429787895,
    "size" : 48374032272,
    "avgObjSize" : 112,
    "storageSize" : 56527326864,
    "numExtents" : 47,
    "nindexes" : 1,
    "lastExtentSize" : 2146426864,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 1,
    "totalIndexSize" : 19917512720,
    "indexSizes" : {
        "_id_" : 19917512720
    },
    "ok" : 1
}

paddingFactor is 1, avgObjSize is 112. But the average document size, which I can see using Object.bsonsize() function, is 73.4 bytes. I have no updates, only inserts and deletes. How can I optimize space consumption? How much RAM does each document consume: 112 or 73.4 bytes?

Upd: How many documents can be stored in a one page: <page size>/<avgObjSize from db.stats()> or <page size>/<avg size from Object.bsonsize()>, i.e 4096/112 or 4096/73.4?

Best Answer

To answer your question, I start out with an empty collection csdbprotobuf.archive on a stand-alone server that has nothing currently in cache

db.serverStatus({workingSet:1})

"workingSet" : {
    "note" : "thisIsAnEstimate",
    "pagesInMemory" : 10,
    "computationTimeMicros" : 4736,
    "overSeconds" : 81
},

Create the database and insert 10000 documents, an integer _id field and a string some name that has nine characters

use csdbprotobuf
t=db.archive
for(var i = 1; i <= 10000; i++){t.insert({a : i, name : 'some name'})}

Run an explain to load my data into memory called pre-heating

t.find().explain()
> t.find().explain()
{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 10000,
    "nscannedObjects" : 10000,
    "nscanned" : 10000,
    "nscannedObjectsAllPlans" : 10000,
    "nscannedAllPlans" : 10000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 78,
    "nChunkSkips" : 0,
    "millis" : 1,
    "server" : "precise64_work:27000",
    "filterSet" : false
}

What is the status of memory?


    "workingSet" : {
        "note" : "thisIsAnEstimate",
        "pagesInMemory" : 363,
        "computationTimeMicros" : 2136,
        "overSeconds" : 600
    },

Initally I had 10 pages of data in memory, after the pre-heat using .explain(), I now have 363 pages in memory. Started out with 31MB of resident memory and now have 34MB which give me a difference of 3MB or 3072KB.

10000 documents / 3072KB = 3.255kb per document.

10000 documents / 353 pages of data = 28 documents per page

Using MongoDB's db.serverStatus({workingSet:1}) command, I was able to determine (approximately) how many documents fit onto a page.

It is important to note though, that each document in MongoDB is totally different. This is because of the polymorphic nature of its schema design. So just trying to determine exactly how much space one document is going to take is probably not what you are looking for.

Just because one document takes up 3KB doesn't mean that the next document won't take up twice that much or even more. This is not the way a traditional RDBMS works as you would know how much space each row can use.