MongoDB document size for collection – impact on RAM and query performance

mongodbmongodb-3.0performancequery-performance

I have been advised on stackoverflow to ask this question on dba.stackexchange instead –

We are using MongoDB version 3.0 with WiredTiger storage.

As newbies to MongoDB, we may have designed our schema naively based on limited knowledge from various books and articles and want to improve the design for better performance.

One or 2 collections have an average object size of 52.3 KB and these collections may likely have millions of records eventually which we may shard. What I want to know is – what would be the impact on RAM if we query on the collection. Please note that the sizes of the documents will not grow much in time.

Eg – 1 document (with avg object size of 52KB) has 91 fields/attributes which include arrays and sub-documents. Say I am interested in around 5 fields in a particular query and I specify these fields in the projection argument- I have verified that appropriate indexes are being used on my query. Will mongoDB load only those 5-6 fields into RAM – the ones I am interested in – or the entire document with 91 fields and size 52KB. My question is for both –

  • Normal queries
  • Aggregation based queries

This will help in estimating my working set size.
And another thing, there are lots of other kinds of queries requiring different set of attributes on the same collection and documents – so covered query indexes may not be feasible for all of them.

Should I explore the possibility of splitting to different collections depending on usage patterns even though they may all be really 1:1 relationships? On the flip-side this will not guarantee atomic writes if lots of attributes are getting updated together.

The reason I ask is that I recently observed that geoNear aggregation queries are definitely faster if I trim down the collection to only a few essential attributes. I have a hunch that probably MongoDB may bring entire documents into RAM since it memory maps data files.

Best Answer

In the future, ask to have the question migrated here instead of double posting.

From the MongoDB Documentation,

To calculate how much RAM you need, you must calculate your working set size, or the portion of your data that clients use most often. This depends on your access patterns, what indexes you have, and the size of your documents. Because MongoDB uses a thread per connection model, each database connection also will need up to 1MB of RAM, whether active or idle.

If the query is not completely covered by an index then the entire document is loaded into RAM by MongoDB. This is the same regardless of the type of query.

As for whether or not you should split your collection, that's really a design decision that requires a lot more background to understand and weigh in on. You'll need to perform tests to see if the gains from it based on the query patterns are worth it compared to the loss of atomicity (a business driven decision potentially).

I probably wouldn't recommend this approach though and would assess what your true SLA's are and whether or not you're using the right amount of hardware (RAM) for the desired results. Your real question here is "Do I have enough RAM to satisfy my query performance requirements when it comes to cached data?" If the answer is no, figure out what design works best with your constraints.