MongoDB: Manual failover and failback between two datacenters

failoverhigh-availabilitymongodb

This is about MongoDB and datacenter failover and failback. We're thinking about using two datacenters and this MongoDB replica set:

DC1-M1  prio 1, votes 1  (datacenter 1, member 1)
DC1-M2  prio 1, votes 1
DC2-M3  prio 0, votes 1
DC2-M4  prio 0, votes 0   <-- please not, 0 votes. DC1 has a majority of voters

We think that we'd like to do datacenter failover manually (because other parts
of the failover process happens manually). We'd do failover by running
rs.reconfigure() in DC2 and including only members DC2-M3 and DC2-M4, and
setting their priority and votes to 1. Then we'll get a replica set in DC2
with two members only (M3 and M4) and one of them will become the primary.

Does this make sense, or can you think of any problems? I think we'd need to
specify force = true when running rs.reconfigure(), since we'd be connected to a secondary. We'd basically follow these instructions:
http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/#reconfigure-by-forcing-the-reconfiguration
We'd do datacenter failback to DC1 by running rs.reconfigure() again, and
adding back the members in DC1.

Do you see any problems with this?

What happes if DC1 has accepted a few writes after we did manual failover to
DC2? Then we have a split brain problem, and then what happens when we run
rs.reconfigure() and add back the members in DC1? MongoDB's automatic
rollback system will kick in? (See
http://docs.mongodb.org/manual/core/replica-set-rollbacks/ ) But will it keep
the changes made to DC1 or those made to DC2? It'd be important that the
changes in DC2 were kept; how can we ensure that those changes aren't lost?

Background:

We have 2 datacenters, and want DC1 to survive an outage of DC2. So we want to
keep the majority of the voters in DC1. At the same time, we'd like to be able
to failover to DC2, and fail back to DC1. We'd prefer not to involve any third
datacenter (e.g. a third datacenter with an arbiter), because our team
thinks it's bad to have dependencies on an additional datacenter. And we
currently think that we want to do failover manually anyway because other
parts of the datacenter failover process happen manually.

Best regards,
KajMagnus

Best Answer

I do not understand why the failover to DC2 has to be done manually (even if other parts have to be done manually: one thing less on your to do list in case of a major failure is always a good thing!).

In general, my feeling is that there are conceptual flaws in your setup.

Here is how I would do it and why.

I would not have manual failover. It is better to have slow access than none. What will happen in the current configuration is that if the primary fails, there will be a tie and therefor the whole set would enter secondary state, effectively turning the cluster into read-only mode. So even when everything else is fine in DC1 and there is no need for failing over to DC2, a failing primary will be a show stopper. With setup, you are artificially creating a single point of failure, effectively gainst the whole idea of a cluster, let alone a multi DC setup. Sounds like a Very Bad Idea™ to me. Automatic failover, even to DC2 sounds like a better idea. Slower reads and (depending on your write concern) slower writes still are better than read only mode.
I would have a third datacenter with only one instance: an arbiter. An arbiter can easily be run on a micro-machine as it will only be called in case of an election and an election is a cheap task in terms of RAM and computation power. The arbiter will help the set to always have a majority: If one DC gets disconnected for whatever reason, the other DC and the arbiter will form a majority. So if one DC goes down, you have only to worry about your other parts of your application. You don't have to wait
I am pretty sure that automatic failover for the other parts of your application can be achieved with some time and effort. Especially if you store all data in mongoDB and you have some sort of session replication available, it should be quite easy. Whether implementing automatic failover is worth the effort is pretty easy to calculate: Get your average downtime, find out how big the losses are created by this downtime in terms of money and customer satisfaction (if applicable). If the costs of implementing automatic failover is below or equal, go for automatic failover. I can help you with that if needed.

Related Solutions

MongoDB ReplicaSet reconfig when primary and majority don’t exist

Yes, you can do this, and in fact with just 2 data centers you really have no other choice. Essentially, whenever you do not have a primary you can still do a reconfig but you must pass in {force:true} to make it happen on a non-primary node.

This is listed in the docs for the rs.reconfig() command and in detail (with examples) as part of a tutorial which describes exactly what needs to be done in the scenario you described:

http://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/

If you are interested, I cover this and other availability issues in the (free) M202: Advanced Deployment and Operations course on MongoDB University

MongoDB failover reasons

Here is what you can do to start

Run the following on the old PRIMARY

var dt = new Date(db.serverStatus().localTime - db.serverStatus().uptime*1000).toString() ; dt

This will print the exact time mongod was started.

PRIMARY failover was triggered 2015-03-03T12:18:41.540-0500.

If dt is close to 2015-03-03T12:18:41.540-0500
- mongod crashed or was restarted
- Check the mongod Log File on the old PRIMARY for that same timeframe
If dt is not close to 2015-03-03T12:18:41.540-0500
- mongod is still running fine
- The old PRIMARY simply became unreachable
- Check network logs along port 27017

Best Answer

Related Solutions

MongoDB ReplicaSet reconfig when primary and majority don’t exist

MongoDB failover reasons

Related Question