I am in the process of planning a MongoDB deployment, this will be my first to production and I have a question regarding the disk size of each shard in a cluster.
This cluster itself will be for storing of time series building sensor data with the potential to accumulate 2TB of data every year quite soon after deployment.
So, I plan to start with a 2 shard cluster (2 x 3 node replica sets) since I then have the query routers and config servers in place from the get go. However I could do with some advice regarding how I go about choosing how much data I should store per shard.
Each 'shard node' will have a minimum of 48GB RAM and obviously the right disk configuration to meet the IOPS requirement.
If I am able to satisfy the IOPS requirement of the applications using this DB, what is to stop me sizing for 2TB per shard? Is there a limit on the volume of data a single shard should hold or guidelines to help my decision making process?
I am reading a lot about performance issues if the data volume exceeds the available RAM per host. But if the disks provide enough IOPS's surely this isn't a problem? I appreciate disks will still be far slower than memory but if MongoDB performs poorly when you exceed the RAM size then how to people deal with large databases? The cost to keep adding small shards to a cluster in order to stay within RAM is huge!
In short, to keep cluster expansion to a minimum, if I can satisfy my IOPS requirement, am I able to safely store any amount of data on a single shard or is there a much lower recommendation and if so, for what reason?
Also, I know i must try and keep my index size below my RAM size to ensure efficient query execution. Here is an example:
If my data volume is 1TB per shard and I have 48GB RAM on each of the
shard nodes, is there a way to estimate index size? The working set is
hard to estimate since this is a data logging system which will update
anything upto the total point count of data entering the database
every minute, which might be 30,000 documents, inserted once, then
updated with minute data for the whole day, then a new 30,000
documents the following day, then updated etc….
Best Answer
You'll need to increase the chunk size (default:64MB), otherwise you'll be limited there.
http://blog.mongodb.org/post/100676030403/sharding-pitfalls-part-iii-chunk-balancing-and
General limit information from MongoDB :