MongoDB: Manual failover and failback between two datacenters

failoverhigh-availabilitymongodb

This is about MongoDB and datacenter failover and failback. We're thinking about using two datacenters and this MongoDB replica set:

DC1-M1  prio 1, votes 1  (datacenter 1, member 1)
DC1-M2  prio 1, votes 1
DC2-M3  prio 0, votes 1
DC2-M4  prio 0, votes 0   <-- please not, 0 votes. DC1 has a majority of voters
  1. We think that we'd like to do datacenter failover manually (because other parts
    of the failover process happens manually). We'd do failover by running
    rs.reconfigure() in DC2 and including only members DC2-M3 and DC2-M4, and
    setting their priority and votes to 1. Then we'll get a replica set in DC2
    with two members only (M3 and M4) and one of them will become the primary.

    Does this make sense, or can you think of any problems? I think we'd need to
    specify force = true when running rs.reconfigure(), since we'd be connected to a secondary. We'd basically follow these instructions:
    http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/#reconfigure-by-forcing-the-reconfiguration

  2. We'd do datacenter failback to DC1 by running rs.reconfigure() again, and
    adding back the members in DC1.

    Do you see any problems with this?

    What happes if DC1 has accepted a few writes after we did manual failover to
    DC2? Then we have a split brain problem, and then what happens when we run
    rs.reconfigure() and add back the members in DC1? MongoDB's automatic
    rollback system will kick in? (See
    http://docs.mongodb.org/manual/core/replica-set-rollbacks/ ) But will it keep
    the changes made to DC1 or those made to DC2? It'd be important that the
    changes in DC2 were kept; how can we ensure that those changes aren't lost?

Background:

We have 2 datacenters, and want DC1 to survive an outage of DC2. So we want to
keep the majority of the voters in DC1. At the same time, we'd like to be able
to failover to DC2, and fail back to DC1. We'd prefer not to involve any third
datacenter (e.g. a third datacenter with an arbiter), because our team
thinks it's bad to have dependencies on an additional datacenter. And we
currently think that we want to do failover manually anyway because other
parts of the datacenter failover process happen manually.

Best regards,
KajMagnus

Best Answer

I do not understand why the failover to DC2 has to be done manually (even if other parts have to be done manually: one thing less on your to do list in case of a major failure is always a good thing!).

In general, my feeling is that there are conceptual flaws in your setup.

Here is how I would do it and why.

  1. I would not have manual failover. It is better to have slow access than none. What will happen in the current configuration is that if the primary fails, there will be a tie and therefor the whole set would enter secondary state, effectively turning the cluster into read-only mode. So even when everything else is fine in DC1 and there is no need for failing over to DC2, a failing primary will be a show stopper. With setup, you are artificially creating a single point of failure, effectively gainst the whole idea of a cluster, let alone a multi DC setup. Sounds like a Very Bad Idea™ to me. Automatic failover, even to DC2 sounds like a better idea. Slower reads and (depending on your write concern) slower writes still are better than read only mode.
  2. I would have a third datacenter with only one instance: an arbiter. An arbiter can easily be run on a micro-machine as it will only be called in case of an election and an election is a cheap task in terms of RAM and computation power. The arbiter will help the set to always have a majority: If one DC gets disconnected for whatever reason, the other DC and the arbiter will form a majority. So if one DC goes down, you have only to worry about your other parts of your application. You don't have to wait
  3. I am pretty sure that automatic failover for the other parts of your application can be achieved with some time and effort. Especially if you store all data in mongoDB and you have some sort of session replication available, it should be quite easy. Whether implementing automatic failover is worth the effort is pretty easy to calculate: Get your average downtime, find out how big the losses are created by this downtime in terms of money and customer satisfaction (if applicable). If the costs of implementing automatic failover is below or equal, go for automatic failover. I can help you with that if needed.