Mongodb – Running Aggregation logic on multiple shards of MongoDB

mongodb

Suppose we have 5 shards in MongoDB having collection of data, and I have to write an Aggregation logic which should work on each of 5 shards in my cluster and collect data from these shards. Should it be taken care at the application developement side? like connecting to each Shard separatley by its shard key and get data Or once I write the aggregation logic and deploy my jar on this cluster it will be handled my MongoDB itself to read from these shards and work aggregation logic on these shards data?

Like in Cassandra MapReduce it will be handled by a Job tracker to send job to appropriate nodes.

Best Answer

You don't ever need to connect to specific shards in a MongoDB database. Instead, you connect to a mongos instance that handles the routing for you.

In your case, you would connect to the mongos instance normally, by typing mongo into the terminal, or through a language specific client. You send your aggregation operation to the mongos instance, and it will distribute the operations to each of the shards and combine the result at the end.

Sharding is, in many ways, transparent to the user (though certainly not transparent to the database architect): many queries that run on a non-sharded mongod instance will run in the same manner on a mongos instance.