This is about MongoDB and datacenter failover and failback. We're thinking about using two datacenters and this MongoDB replica set:
DC1-M1 prio 1, votes 1 (datacenter 1, member 1)
DC1-M2 prio 1, votes 1
DC2-M3 prio 0, votes 1
DC2-M4 prio 0, votes 0 <-- please not, 0 votes. DC1 has a majority of voters
-
We think that we'd like to do datacenter failover manually (because other parts
of the failover process happens manually). We'd do failover by running
rs.reconfigure()
in DC2 and including only members DC2-M3 and DC2-M4, and
setting their priority and votes to 1. Then we'll get a replica set in DC2
with two members only (M3 and M4) and one of them will become the primary.Does this make sense, or can you think of any problems? I think we'd need to
specifyforce = true
when runningrs.reconfigure()
, since we'd be connected to a secondary. We'd basically follow these instructions:
http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/#reconfigure-by-forcing-the-reconfiguration -
We'd do datacenter failback to DC1 by running
rs.reconfigure()
again, and
adding back the members in DC1.Do you see any problems with this?
What happes if DC1 has accepted a few writes after we did manual failover to
DC2? Then we have a split brain problem, and then what happens when we run
rs.reconfigure()
and add back the members in DC1? MongoDB's automatic
rollback system will kick in? (See
http://docs.mongodb.org/manual/core/replica-set-rollbacks/ ) But will it keep
the changes made to DC1 or those made to DC2? It'd be important that the
changes in DC2 were kept; how can we ensure that those changes aren't lost?
Background:
We have 2 datacenters, and want DC1 to survive an outage of DC2. So we want to
keep the majority of the voters in DC1. At the same time, we'd like to be able
to failover to DC2, and fail back to DC1. We'd prefer not to involve any third
datacenter (e.g. a third datacenter with an arbiter), because our team
thinks it's bad to have dependencies on an additional datacenter. And we
currently think that we want to do failover manually anyway because other
parts of the datacenter failover process happen manually.
Best regards,
KajMagnus
Best Answer
I do not understand why the failover to DC2 has to be done manually (even if other parts have to be done manually: one thing less on your to do list in case of a major failure is always a good thing!).
In general, my feeling is that there are conceptual flaws in your setup.
Here is how I would do it and why.