Mongodb – Should I deploy Mongodb sharding for 50 collections

mongodb

I have a 10 node clusters that runs about 50 concurrent jobs. Each job needs to read/write a separate collection. So I have about 50 collections roughly. Each collection has about 20 M records. Most of the time, jobs only need to do sequential read/write.

For simplicity, I could deploy a single instance of mongodb that has no replication, no sharding. And have 50 separate collections. But the single node where mogodb is running becomes a hotspot and the rest 9 nodes can't share the read/write load.

So I would like to leverage the resource and balance the load. But I guess a 20M record collection is not worth sharding? Especially I only need to do sequential read/write. I thought about merging 50 collections into one big collection. But I am stumbled on the document size limitation which is 16 MB.

Any suggestions? Thanks,

Best Answer

You could use a sharded cluster to distribute the collections more evenly across your available hosts. In general I would recommend not going beyond 5 shards (with at least 2 data bearing nodes for redundancy, plus an arbiter to break ties).

Once you have your 5 shards (or however many you end up with), you can then "pin" the collections to a particular shard with tagging. I could explain that process in detail, but as usual, Kristina Chodorow has beaten me to it:

http://www.kchodorow.com/blog/2012/07/25/controlling-collection-distribution/

Although it is not required, and you can have a single database if you wish, I would suggest having more than a single database in order to benefit as much as possible. Locking is at the database level, so concurrent writes will still be competing if you have more than one collection on a shard in the same database.

In fact, there is a simpler solution to this than tagging (though tags are the most scalable solution). If you create separate databases for all of your collections (or at least a subset), you can have each database reside on a different shard and not shard the collections at all.

To explain: without enabling sharded collections, all data for a given database will reside on the "primary" shard for that database (primary shards are designated in a round-robin fashion). Hence if you create 5 databases on a 5 shard cluster (for example) each will have a different shard for its primary and you will have achieved a crude (but effective) distribution of load by using separate databases. I added more detail regarding this in a previous answer.

Related Solutions

Mongodb – Is sharding effective for small collections

Is it also effective for 10 000 collections with 10 000 documents each ?

Most people have the "single large collection" problem and so the sharding is clearly useful for reducing headaches of balancing this data.

However, when you have 10 000 small collections, your headache is probably not "balancing the data". With this many small collections your problem is likely about tracking these collections. Depending on your document size, you may not even break the lower limit for sharding to actually happen.

For the really small collections, you can use the little-known movePrimary command to manage the location of your data.

Of course, the other way to look at this is why do you have 10k collections? A collection doesn't need homogeneous objects and with 10k collections most of them have to be generated. It's quite possible to store different "types" of data in the same collection, reduce the number of collections and then include the type as part of the shard key.

Mongodb: drop all collections

db.dropDatabase() will drop the database, which will also drop all of the collections within a database.

If you need to see what databases you have, you can do show dbs.

Update: Here's a script to delete everything, make sure you really want to do this:

var dbs = db.getSisterDB('admin').adminCommand("listDatabases").databases
for(d in dbs) db.getSisterDB(dbs[d].name).dropDatabase();

Best Answer

Related Solutions

Mongodb – Is sharding effective for small collections

Mongodb: drop all collections

Related Question