Mongodb – Mongo db and sharding / replica sets concept questions

mongodbsharding

I have few questions about sharding and replica sets:

  1. I am planning to write important logs to a sharded environment. about 100 million rows in a day, If I will connect to a single MONGOS process , wouldn't it be over loaded? can I define more entry points to the db?
  2. Can a config server be in the same servers as the sharded servers?
  3. if you look at this image : http://i.stack.imgur.com/lcp4X.png , in the bottom, 2 shards nodes that is a replica set. How come a shard be a replica set? isn't replica set is a full data node and a shard is a subset of the data? how can they be together?

enter image description here

here is what I don't get ):

"… A replica set is a group consisting of 1 primary and N secondaries. If you are sharding, each group will be a shard. That means that the single primary and all secondaries of the same group will have the same data (replicated)…"

From my understanding:
replica set is one primary that accepts all data.
Its data getting synced to the secondaries.

Sharding is a set of instances that accept subset of the data.

So, if replica set has all the data, and a shard has some data (on each server), how can they live together?

Best Answer

1 - You can have more than 1 mongos instance and connect to whichever you want (the client driver should have that option). Nevertheless, a mongos is just a router, meaning it will only route the requests to the correct shard(s).

2 - Yes, a config server can be in the same machine as a primary/secondary, just don't put config instances together (you are required to have 3 because of redundancy).

3 - A replica set is a group consisting of 1 primary and N secondaries. If you are sharding, each group will be a shard. That means that the single primary and all secondaries of the same group will have the same data (replicated). Also, having more secondaries won't help you increase the write performance as only the primary is able to do perform inserts/updates/deletes. Considering this, the only way to increase the write performance is by having more shards (horizontal scaling) and, of course, choosing a good sharding key that will balance data across all shards (you don't want all your data to be in 1 shard and all others remain empty). For further info check this official doc explaining the concept of replica set and its members http://docs.mongodb.org/manual/core/replica-set-members/

Also, instances of a replica set shard (the primary and N secondaries) are not required to be in the same machine. You might want to put them in separated machines to increase redundancy and perhaps have a better distribution of the load.