Mongodb – Mongo Map-Reduce or Sharding

mongodb

So I asked this question:

Setting up Mongo with clustering

But I didn't know enough until I read the answers and did more research. Is it better to use Map-reduce or sharding for fast-paced systems with ~ 100 queries all asked at the same time? Sharding will displace my data across clusters and it appears it essentially does a map-reduce or sorts. If I were to use map-reduce for such queries, should I do so with a cluster as well? If so, how easy is it to start with one Mongo server, then later use a cluster and distribute the map-reduce tasks to the cluster? Or am I confused in what I think I need to accomplish?

Best Answer

It's unclear what you mean by comparing map-reduce to sharding. But the short answer is: sharding.

Generally speaking you design-out map-reduce queries, you do not want 100s of map-reduce queries being executed at once - you'd just overload mongo since that essentially means 100s of full collection scans all being run at the same time.

If you have an example of one of your existing map-reduce queries - please add it to your question.

Regarding sharding, it all comes down to what you use for the sharding key.

If you shard a users collection on username for example,

db.users.find()

will cause mongos to send the query to all shards and merge the result sets together (intelligently). Adding the sharding key to the query:

db.users.find().sort({username: 1}).limit(100);

would give mongos the option to talk to less mongod at a time.

a better example, if you query for:

db.users.find({username: /^bob/})

mongos will send the query to the shards whose shard keys indicate that they could contain answers, most likely that is one server only, resulting in a fast query and no extra load on the part of mongos.

Maybe the above examples are not-news to you.

The queries you send to mongo now are same syntax you'd use to send to a sharded database. The only thing you'd do differently is (previously) analysing on what keys to shard so that you can, where necessary, modify your queries to incorporate the sharding key and thereby enable mongos to act like a proxy instead of an agregator.

A poor sharding key, or simply not taking advantage of sharding in the queries you are generating, would result in mongos needing to query all mongod servers for all queries resulting in high load and poor performance.