Mongodb – Sizing a MongoDB shard

mongodbsharding

I am in the process of planning a MongoDB deployment, this will be my first to production and I have a question regarding the disk size of each shard in a cluster.

This cluster itself will be for storing of time series building sensor data with the potential to accumulate 2TB of data every year quite soon after deployment.

So, I plan to start with a 2 shard cluster (2 x 3 node replica sets) since I then have the query routers and config servers in place from the get go. However I could do with some advice regarding how I go about choosing how much data I should store per shard.

Each 'shard node' will have a minimum of 48GB RAM and obviously the right disk configuration to meet the IOPS requirement.

If I am able to satisfy the IOPS requirement of the applications using this DB, what is to stop me sizing for 2TB per shard? Is there a limit on the volume of data a single shard should hold or guidelines to help my decision making process?

I am reading a lot about performance issues if the data volume exceeds the available RAM per host. But if the disks provide enough IOPS's surely this isn't a problem? I appreciate disks will still be far slower than memory but if MongoDB performs poorly when you exceed the RAM size then how to people deal with large databases? The cost to keep adding small shards to a cluster in order to stay within RAM is huge!

In short, to keep cluster expansion to a minimum, if I can satisfy my IOPS requirement, am I able to safely store any amount of data on a single shard or is there a much lower recommendation and if so, for what reason?

Also, I know i must try and keep my index size below my RAM size to ensure efficient query execution. Here is an example:

If my data volume is 1TB per shard and I have 48GB RAM on each of the
shard nodes, is there a way to estimate index size? The working set is
hard to estimate since this is a data logging system which will update
anything upto the total point count of data entering the database
every minute, which might be 30,000 documents, inserted once, then
updated with minute data for the whole day, then a new 30,000
documents the following day, then updated etc….

Best Answer

You'll need to increase the chunk size (default:64MB), otherwise you'll be limited there.

http://blog.mongodb.org/post/100676030403/sharding-pitfalls-part-iii-chunk-balancing-and

General limit information from MongoDB :

Database Size

The MMAPv1 storage engine limits each database to no more than 16000 data files. This means that a single MMAPv1 database has a maximum size of 32TB. Setting the storage.mmapv1.smallFiles option reduces this limit to 8TB.

Data Size

Changed in version 3.0.

Using the MMAPv1 storage engine, a single mongod instance cannot manage a data set that exceeds maximum virtual memory address space provided by the underlying operating system.

Virtual Memory Limitations :

Linux: 64 terabytes (journaled) - 128 terabytes (not journaled)

Windows Server 2012 R2/Windows 8.1: 64 terabytes (journaled) - 128 terabytes (not journaled)

Windows (otherwise): 4 terabytes (journaled) - 8 terabytes (not journaled)

(The WiredTiger storage engine is not subject to this limitation.)