MongoDB Scalability – Scaling Out on the Same Storage

mongodbscalability

We are planning to have some tests to predict how mongo db operates and functions with 2 terabyte of inforamtion on a single collection.

So the plan is to have 15-20 sharded machines (on the same environment / cabinet).
All are considering to have shared EMC storage

I wanted to ask before we start the test :

  1. IS there any different between:
    * Having physical machines writing to independent disks (on the machine).
    * Having physical machines writing to the same storage.
    * Having virtual machines writing to same storage.
  2. About memory consumption, What are the hardware considerations about querying this amount of data? If I understand correctly, the advantage when you have massive writes is to do sharding with small-level servers, but is it going to change when you need to have some reads?

Thanks

Best Answer

  1. If you have multiple databases, writing to independent disks (with the directory per db) option will be your best bet for performance. Even if you just have one big huge sharded database, you can still get some performance boost with multiple disks by putting your journal and local database (database where replication info goes) on separate disks, but your main database data files would all have to be on a single disk.
  2. Not sure what you're after here. The recommendation is that your working set fit in memory for best performance and that still applies for sharding, its just split across multiple servers and thus your memory requirements are also split up across those servers. That will of course also depend on how well you pick your sharding key, hopefully the load is distributed evenly across all the nodes. If you plan on returning lots of data from your queries then you might need extra memory on your mongo-s servers as they have to do some sorting and aggregation of what you're bringing back.