Mongodb – sharding and replication on mongodb

mongodb

My question is on the answer given at Difference between Sharding And Replication on MongoDB

I need to split thedata of 75GB into 3 shards of 25GB each with replication factor of 3. Answer depicts below picture

                            Sharded Cluster             
             /                    |                    \
      Shard A                  Shard B                  Shard C
        / \                      / \                      / \
+-------+ +---------+    +-------+ +---------+    +-------+ +---------+
|Primary| |Secondary|    |Primary| |Secondary|    |Primary| |Secondary|
|  25GB |=| 25GB    |    | 25 GB |=| 25 GB   |    | 25GB  |=| 25GB    |   
+-------+ +---------+    +-------+ +---------+    +-------+ +---------+

and says we need at least 6 database servers organized in three replica-sets. Each replica-set consists of two servers who have the same 25GB of data.

My understanding is that each shard is a physical machine holding below two sets of data(primary and secondary) and host two separate mongod server/instance . One will hold primary data and second will hold secondary data which is back up of primary data on another shard

  1. Primary data :- This is 1/3 of the 75GB data distributed over some partition key

  2. Secondary :- This will hold secondary data which is back up of primary data on another shard

Application query will go only to primary data on a given shard based on partition key. It is possible that on a shard a secondary gets converted to primary if another shard primary goes down . Is my understanding correct here ?

No of instances of Mongos and Arbiter

Also i believe i need to start only 1 mongos instance(which will be separate from mongod instance) ? Similarly I need to have single mongo arbiter for all 3 shards instead of 3 on each shard ? Both mongos and aribiter can be on separate machine or share the hardware of one of the shard depending on requirement ?

Best Answer

My understanding is that each shard is a physical machine. holding below two sets of data(primary and secondary) and host two separate mongod server/instance.

No, each replica-set member is a physical machine. In the above graphic, each shard consists of two physical machines. While you can technically run multiple members of a replica-set on the same machine, there is nothing to gain from that.

. One will hold primary data and second will hold secondary data which is back up of primary data on another shard

No, the secondary will back up the data from the other member(s) of the same shard. But if you want to build the server-level equivalent of a mirroring RAID, you could of course put two members of different shards on the same physical machine.

i believe i need to start only 1 mongos instance

You only need one, but to avoid a single point of failure, you might want to have more than one.

Similarly I need to have single mongo arbiter for all 3 shards instead of 3 on each shard?

An arbiter can only be member of a single replica-set. So you need 3 arbiters. However, the arbiters are very lightweight. They don't hold any data and don't do anything unless there is a primary election. So you do not need a dedicated server for the arbiter processes.