MongoDB document size for collection – impact on RAM and query performance

mongodbmongodb-3.0performancequery-performance

I have been advised on stackoverflow to ask this question on dba.stackexchange instead –

We are using MongoDB version 3.0 with WiredTiger storage.

As newbies to MongoDB, we may have designed our schema naively based on limited knowledge from various books and articles and want to improve the design for better performance.

One or 2 collections have an average object size of 52.3 KB and these collections may likely have millions of records eventually which we may shard. What I want to know is – what would be the impact on RAM if we query on the collection. Please note that the sizes of the documents will not grow much in time.

Eg – 1 document (with avg object size of 52KB) has 91 fields/attributes which include arrays and sub-documents. Say I am interested in around 5 fields in a particular query and I specify these fields in the projection argument- I have verified that appropriate indexes are being used on my query. Will mongoDB load only those 5-6 fields into RAM – the ones I am interested in – or the entire document with 91 fields and size 52KB. My question is for both –

Normal queries
Aggregation based queries

This will help in estimating my working set size.
And another thing, there are lots of other kinds of queries requiring different set of attributes on the same collection and documents – so covered query indexes may not be feasible for all of them.

Should I explore the possibility of splitting to different collections depending on usage patterns even though they may all be really 1:1 relationships? On the flip-side this will not guarantee atomic writes if lots of attributes are getting updated together.

The reason I ask is that I recently observed that geoNear aggregation queries are definitely faster if I trim down the collection to only a few essential attributes. I have a hunch that probably MongoDB may bring entire documents into RAM since it memory maps data files.

Best Answer

In the future, ask to have the question migrated here instead of double posting.

From the MongoDB Documentation,

To calculate how much RAM you need, you must calculate your working set size, or the portion of your data that clients use most often. This depends on your access patterns, what indexes you have, and the size of your documents. Because MongoDB uses a thread per connection model, each database connection also will need up to 1MB of RAM, whether active or idle.

If the query is not completely covered by an index then the entire document is loaded into RAM by MongoDB. This is the same regardless of the type of query.

As for whether or not you should split your collection, that's really a design decision that requires a lot more background to understand and weigh in on. You'll need to perform tests to see if the gains from it based on the query patterns are worth it compared to the loss of atomicity (a business driven decision potentially).

I probably wouldn't recommend this approach though and would assess what your true SLA's are and whether or not you're using the right amount of hardware (RAM) for the desired results. Your real question here is "Do I have enough RAM to satisfy my query performance requirements when it comes to cached data?" If the answer is no, figure out what design works best with your constraints.

Related Solutions

Mongodb – How to explicitly define data types for a MongoDB collection

I had found this a while ago for the possibility of validating json data against a schema. If you wanted to do something like this, however, there's stuff you can do in javascript inside the database.

Mongodb – Understanding document size in MongoDB

To answer your question, I start out with an empty collection csdbprotobuf.archive on a stand-alone server that has nothing currently in cache

db.serverStatus({workingSet:1})

"workingSet" : {
    "note" : "thisIsAnEstimate",
    "pagesInMemory" : 10,
    "computationTimeMicros" : 4736,
    "overSeconds" : 81
},

Create the database and insert 10000 documents, an integer _id field and a string some name that has nine characters

use csdbprotobuf
t=db.archive
for(var i = 1; i <= 10000; i++){t.insert({a : i, name : 'some name'})}

Run an explain to load my data into memory called pre-heating

t.find().explain()
> t.find().explain()
{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 10000,
    "nscannedObjects" : 10000,
    "nscanned" : 10000,
    "nscannedObjectsAllPlans" : 10000,
    "nscannedAllPlans" : 10000,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 78,
    "nChunkSkips" : 0,
    "millis" : 1,
    "server" : "precise64_work:27000",
    "filterSet" : false
}

What is the status of memory?


    "workingSet" : {
        "note" : "thisIsAnEstimate",
        "pagesInMemory" : 363,
        "computationTimeMicros" : 2136,
        "overSeconds" : 600
    },

Initally I had 10 pages of data in memory, after the pre-heat using .explain(), I now have 363 pages in memory. Started out with 31MB of resident memory and now have 34MB which give me a difference of 3MB or 3072KB.

10000 documents / 3072KB = 3.255kb per document.

10000 documents / 353 pages of data = 28 documents per page

Using MongoDB's db.serverStatus({workingSet:1}) command, I was able to determine (approximately) how many documents fit onto a page.

It is important to note though, that each document in MongoDB is totally different. This is because of the polymorphic nature of its schema design. So just trying to determine exactly how much space one document is going to take is probably not what you are looking for.

Just because one document takes up 3KB doesn't mean that the next document won't take up twice that much or even more. This is not the way a traditional RDBMS works as you would know how much space each row can use.

Best Answer

Related Solutions

Mongodb – How to explicitly define data types for a MongoDB collection

Mongodb – Understanding document size in MongoDB

Related Question