What happened
I have asked in the comments of the question that OP provides the output of rs.status()
The reason for that was that the primary reverted to secondary status once a single member was shut down. It was obvious that the cluster lost the quorum necessary to elect a new primary. This could only be the case when one additional of the voting members of the original members of the replica set is or became unavailable.
As it turned out, the arbiter of the replica set in question was not reachable by the primary, which after the shutdown of the PC was the only remaining member (from it's point of view) of the replica set. So it wasn't possible to hold an election with a quorum of the configured replica set members and it consequently reverted to secondary state.
How to prevent
- Always run
rs.status()
after setting up a replica set.
- Always run
rs.status()
when encountering problems with a replica set
- Always do fail tests (down to loosing write capabilities) and ensure your application handles those situations gracefully (as OP did)
Using these rules, you will eliminate the vast majority of problems one can face when using a replica set.
Personally, I think MongoDB Inc.'s Cloud Manager is a must for production environments, since it shows such problems os OP had right away and has alerting built in.
Side note
Never, ever (and yes, that means no exception for no reason, however sound the reasons may seem to be) put an arbiter on a data bearing node of the same replica set.
Imagine the node with the arbiter and the data bering node goes down.
If you have a 3 member replica set, you wouldn't have a quorum of the original members, your remaining member would automatically revert to secondary, loosing the failover capability.
In a 5 member replica set, two voting members would be eliminated. Fine as long as all others are up and running, right? Except, it isn't fine. If another node fails, you'd loose your quorum again. So with only two nodes failed, the other two nodes become more or less useless. Given the price of a virtual server today (and even the smallest ones are well sufficient to run an arbiter), this simply does not make sense. You'd be paying 4 data bearing nodes anyway and loose failover capabilities because you tried to save a tiny fraction of the overall costs.
With a 7 member replica set, those costs become an even tinier fraction of the overall costs.
Conclusion: It's simply a bad decision business wise to have a arbiter running on the same machine as a data bearing node, even when setting aside the technical aspects.
MongoDB replication is to provide automatic fail over during disaster recovery, minimize downtime during maintenance, to use the secondary for analytical use cases and so on. In your scenario, you don't want to avail these benefits during off-peak hours and as far as I know there is no perfect workaround to do these tasks without downtime.
Simply, shutting down the secondary + arbiter makes Mongo unable to
elect a new primary and the app fails
Yes, election process will not happen in a replica set with single node. It needs at least two mongod nodes which might not be present in your case if you remove secondary and an arbiter.
If the best/only solution is to remove the secondary and arbiter from
the replica set
You can remove the secondary and arbiter from the replica set by following the steps in this link mongo-replica-set-to-single-server.
what are the consequences of doing when adding them
back on the next peak hours cycle?
While adding back the replica sets, you will have to restart mongod instances, issue rs.reconfig() method and it can force the current primary to step down which causes an election. During this window time, your app will not read/write anything from/into database which can last up to 10-30 seconds (for re-configuring replicas), another 20-30 seconds (for restarting mongod instances) based on network bandwidth and replica set configuration. Hence, it is not recommended during peak hours.
Best Answer
As per MongoDB documentation here to providing all the functionality of master-slave deployments, replica sets are also more robust for production use. Master-slave replication preceded replica sets and made it possible to have a large number of non-master (i.e. slave) nodes, as well as to restrict replicated operations to only a single database; however, master-slave replication provides less redundancy and does not automate failover.
Master instances store operations in an oplog which is a capped collection. As a result, if a slave falls too far behind the state of the master, it cannot “catchup” and must re-sync from scratch. Slave may become out of sync with a master if:
When slaves are out of sync, replication stops. Administrators must intervene manually to restart replication. Use the resync command. Alternatively, the --autoresync allows a slave to restart replication automatically, after ten second pause, when the slave falls out of sync with the master. With --autoresync specified, the slave will only attempt to re-sync once in a ten minute period.
To prevent these situations you should specify a larger oplog when you start the master instance, by adding the --oplogSize option when starting mongod. If you do not specify --oplogSize, mongod will allocate
5%
of available disk space on start up to the oplog, with a minimum of1 GB
for64-bit
machines and50 MB
for32-bit
machines.As MongoDB documented here from MongoDB 4.0 removes support for master-slave replication. Before you can upgrade to MongoDB 4.0, if your deployment uses master-slave replication, you must upgrade to a replica set.
Warning : Deprecated since version 3.2: MongoDB 3.2 deprecates the use of master-slave replication for components of sharded clusters.
Important: Replica sets replace master-slave replication for most use cases. If possible, use replica sets rather than master-slave replication for all new production deployments. This documentation remains to support legacy deployments and for archival purposes only.
After modification the question of OP
As here MongoDB provides two options for performing an initial sync:
Restart the mongod with an empty data directory and let MongoDB’s normal initial syncing feature restore the data. This is the more simple option but may take longer to replace the data.
See Automatically Sync a Member.
Restart the machine with a copy of a recent data directory from another member in the replica set. This procedure can replace the data more quickly but requires more manual steps.
See Sync by Copying Data Files from Another Member.
Reconfigures an existing replica set, overwriting the existing replica set configuration. To run the method, you must connect to the primary of the replica set.
For Example
To reconfigure an existing replica set, first retrieve the current configuration with rs.conf(), modify the configuration document as needed, and then pass the modified document to rs.reconfig().
rs.reconfig() provides a wrapper around the replSetReconfig command.
The force parameter allows a reconfiguration command to be issued to a non-primary node.
For further your ref here, here and here