MongoDB – Set master after re-initation

mongodb

Using version 2.6 (I know…)

My goal is to test a failover.
There are 3 nodes in a replica set.

- node1:Master
- node2:Secondary
- node3:Secondary

I shut node1 and node2 down. Where node3 is a secondary, in READ-ONLY mode.

Because the application requires changes to be written to a mongo node, the config is changed of node3 to leave the Replica set and act as a single node. Now node3 is accepting writes. This is fine so far.

When re-enabling the replica set, node3 isn't set as master but as secondary, therefore it loses the writes and tries to catch up with node1 or node2, which is a master. This is the part where I lose my data.

The question is:

How can I ensure no data is lost on node3 and enable the replica set agin

Best Answer

Because the application requires changes to be written to a mongo node, the config is changed of node3 to leave the Replica set and act as a single node.

A normal replica set failover situation involves one or more members being unavailable, subject to the fault tolerance of your replica set configuration. As long as a majority of configured voting members in the replica set are healthy, they should be able to automatically elect a new primary without manual intervention.

If a majority of members are unavailable and you need to manually reconfigure the replica set to recover write availability, the correct process to follow is a forced reconfiguration with the surviving member(s). This process will ensure that the replica set version information is updated so when former members rejoin they will detect the new configuration and resume syncing if possible.

In this disaster scenario, you are effectively reconfiguring your replica set to have a single member (node3), and later re-adding other members to rebuild the replica set.

Now node3 is accepting writes. This is fine so far.

This is actually not fine, as far as replication goes. If you restart a mongod that is part of a replica set without the replSet parameter and start writing in standalone mode, the standalone data will diverge from the other members of the replica set. Any standalone writes are not noted in the replication oplog, and there is no way to reconcile local changes with other writes that may have happened in the replica set.

When re-enabling the replica set, node3 isn't set as master but as secondary, therefore it loses the writes and tries to catch up with node1 or node2, which is a master. This is the part where I lose my data.

If you later rejoin that mongod to the same replica set, the assumption will be that no writes have happened outside of the oplog. The member may also have an older view of the replica set configuration if there have been changes since the member was offline. Replication will try to sync to a common point in the oplog of a current replica set member (assuming there is one), but you will have introduced data inconsistency via direct updates in standalone mode. Local writes will not be rolled back (or replicated) because the oplog has no record of the changes. This can lead to data loss and other conflicts, because node3 breaks the requirement that all members have the same data (aside from oplog entries not applied yet due to replication lag).

If you follow the forced reconfiguration approach noted earlier, you would end up re-adding former members to your new replica set configuration and would not lose any data.