MongoDB delayed member replication mechanism

mongodbreplication

We are considering implementing a delayed member in our MongoDB replicaset to protect us from human error based data corruption, e.g. someone accidentally deletes some data.

How does a delayed member actually delay replication, as I see it there are two possibilities:

  1. the delayed member observes the primary's oplog with a filter of ts < now - delay
  2. the delayed member observes the primary's oplog in real time, stores the oplog locally and replays each action as ts >= now - delay

the reason this is vital for us is we're looking at a relatively large replication delay (24 hours). In normal operations our oplog maintains about 5 days of data, so whether (1) or (2) above is used in replication, we're fine. However, at certain times, our write workload is extremely high, in that situation our oplog will occasionally only contain < 1hr of data. In this situation (1) above will not work for us.

Best Answer

The implementation is as per your first suggestion: a delayed secondary applies operations from the source oplog based on the delay. Your concern is also valid: if the source oplog period is insufficient to cover the secondary delay, the delayed secondary will become stale and require a re-sync.

A delayed secondary will help you with some data recovery scenarios (for example, a collection drop that hasn't been applied yet), but for more comprehensive recovery options I would consider a continuous backup solution such as MongoDB Cloud Manager (or MongoDB Ops Manager, which is the on-premise equivalent). Cloud Manager Backup provides continuous backups with queryable snapshots that allow you to preview and restore a subset of data. Base snapshots are taken every 6 hours by default, with a customisable snapshot interval and retention policy.