MongoDB Backup – Using –oplog Option for Replicaset with 10 Hours Data Loss

backupmongodbmongodb-3.0

We have a 3 node replica set. We take backups from secondary with mongodump. Our customer said that it's ok too loose 10 hours of data in worst case.

So we decided we don't need point in time recovery --oplog (oplog.bson).

Is this correct in our use case with 10 hours of data loss acceptable?

Is oplog the same as MySQL binary logs?

Best Answer

Yes. OPlog is kind of the same as MySQL binary logs as both are used to sync data to secondary and point in time recovery.

You can use mongodump without --oplog option. You will only loose current writes in your backup. As you have a 10 hour window it wouldn't be an issue.

In this scenario I recommend you to use one of your replica set with slaveDelay.

Related Solutions

Sql-server – Does it make sense to use full recovery if the data and log files are on the same HD

I can't speak to 1 but for 2, I don't know that your position should be pushing the business toward a specific RPO (recovery point objective). They may not be aware that they'd have to re-enter all the data for a day if things go belly up. Talk to them, find out how much data loss they're willing to tolerate. If they say 24 hours is too much, great, then that indicates the current approach is insufficient for their needs. If that requires hardware purchase to meet the RPO, then they will need to provide funding or accept their current max data loss. Finally, document the outcome in some public place and then test on your restores a recurring basis to ensure you are able to meet that RPO.

That said, there are plenty of other reasons to have data and log (and temp) on separate drives. Some of them documented on this question https://serverfault.com/questions/38511/ms-sql-layout-for-best-performance

MongoDB 3.0 – How Elections Happen When a Secondary Goes Down

What happened

I have asked in the comments of the question that OP provides the output of rs.status()

The reason for that was that the primary reverted to secondary status once a single member was shut down. It was obvious that the cluster lost the quorum necessary to elect a new primary. This could only be the case when one additional of the voting members of the original members of the replica set is or became unavailable.

As it turned out, the arbiter of the replica set in question was not reachable by the primary, which after the shutdown of the PC was the only remaining member (from it's point of view) of the replica set. So it wasn't possible to hold an election with a quorum of the configured replica set members and it consequently reverted to secondary state.

How to prevent

Always run rs.status() after setting up a replica set.
Always run rs.status() when encountering problems with a replica set
Always do fail tests (down to loosing write capabilities) and ensure your application handles those situations gracefully (as OP did)

Using these rules, you will eliminate the vast majority of problems one can face when using a replica set.

Personally, I think MongoDB Inc.'s Cloud Manager is a must for production environments, since it shows such problems os OP had right away and has alerting built in.

Side note

Never, ever (and yes, that means no exception for no reason, however sound the reasons may seem to be) put an arbiter on a data bearing node of the same replica set.

Imagine the node with the arbiter and the data bering node goes down.

If you have a 3 member replica set, you wouldn't have a quorum of the original members, your remaining member would automatically revert to secondary, loosing the failover capability.

In a 5 member replica set, two voting members would be eliminated. Fine as long as all others are up and running, right? Except, it isn't fine. If another node fails, you'd loose your quorum again. So with only two nodes failed, the other two nodes become more or less useless. Given the price of a virtual server today (and even the smallest ones are well sufficient to run an arbiter), this simply does not make sense. You'd be paying 4 data bearing nodes anyway and loose failover capabilities because you tried to save a tiny fraction of the overall costs.

With a 7 member replica set, those costs become an even tinier fraction of the overall costs.

Conclusion: It's simply a bad decision business wise to have a arbiter running on the same machine as a data bearing node, even when setting aside the technical aspects.