Mongodb – How to deploy MongoDB config servers to achieve maximum availability during network partitions

mongodbsharding

What's the recommended way to deploy config servers?

Assume, we have two shards, each one is a 3-nodes replica set.
If I deploy all config servers together with one of these shards, a single network partition can make the second shard completely not functional.

If I deploy 3 config servers in a distributed fashion, then after a network partition both shards will be available, but in read-only mode.

If I deploy config servers to a separate datacenter, it will just move the point of failure from shard-to-shard network part to shard-to-datacenter network part.

Any better options?

Best Answer

If you lose one of the three config servers, you just halt migrations, so in a balanced system where you're not exceeding 80% capacity, you should have hours and hours of time to page the duty DBA and bring back that one server.

You could virtualize them or place them on management band / highly reliable servers if you really can't afford to have down time for the cluster to perform chunk migration over a long weekend or other situation where you're not attentive to the config servers.

Most of your planning will be to make sure you understand the network partition scenarios and that the primary stays running and you have enough secondaries to serve read loads as needed. It's hard to generalize other than:

  • Add arbiters to keep the voting members an odd number.
  • Test your outage scenarios
  • Scale the system so you can actually perform rolling outages and verify that things work well without killing your application in practice. You might not test at peak load, but you should be able to take rolling outages routinely several times a week until you're confident in your tools, scripts and configuration.