Mongodb – In shard cluster, if 2 replica set nodes are offline leaving only 1 secondary..where do writes go

mongodb

This is probably a simple question that i can't get definitive answer too.

If i have 10 shards, each being a 3 node replica set. If i loose 2 servers in a replica set causing the primary to change to a read-only secondary. Where does the mongos write new chunks to that the hash key would have sent to this shards primary?

Does the config server detect the shard being read only and inform the mongos that writes should be redirected to another shard?

Also, assuming the above is correct..when the shard gets fixed and has a primary again. Will the chunks be re-balanced to it?

thank you
fLo

Best Answer

If i have 10 shards, each being a 3 node replica set. If i loose 2 servers in a replica set causing the primary to change to a read-only secondary. Where does the mongos write new chunks to that the hash key would have sent to this shards primary?

Each shard contains a subset of data for the sharded cluster. If the replica set backing a shard becomes read-only you will get exceptions trying to write new data to that shard. The mechanics of failover and high availability for an individual shard are governed by your Replica Set configuration and deployment.

Does the config server detect the shard being read only and inform the mongos that writes should be redirected to another shard?

No. In order to achieve this, the existing data would have to be migrated to another shard. The config servers and mongos do not monitor whether a shard is read-only or attempt to work around this case. When write operations are directed to that shard, they will generate exceptions.

If you do have an outage for a shard, one of the steps you should take is to disable the balancer. This will prevent attempts to migrate or rebalance data while you work on restoring that shard.

when the shard gets fixed and has a primary again. Will the chunks be re-balanced to it?

Once the shard has a primary, normal operation can resume. You should also remember to re-enable the balancer if you disabled it during your outage.