MongoDB Replication – Synchronizing Delayed Replicas in Sharded Cluster

backupmongodbreplication

Please take a look at my mongodb map at http://postimg.org/image/krzvmep4d/

What synchronizes two delayed replicas, which is part of the replica set, included in sharded cluster? How can I use my delayed backup, if my replicaset "halfs" are not synchronized, because of I have shutdown one of my production servers first (using db.shutdownServer()), and shutdown another my server a few minutes later?

Or my mongo map just not perfect? In this case, please explain what I do wrong.

P.S. Re-added from here https://stackoverflow.com/questions/22715589/what-synchronizes-two-delayed-replicas-which-is-part-of-the-replica-set-includ

Best Answer

There is an obvious problem with your "delayed backup" model, in that your delayed secondaries will reflect the state of each replica set but not the full state of the sharded cluster at a given point in time.

A simple example:

  • there is a chunk migration from shard1 => shard2 in progress
  • documents will exist on both shard1 and shard2 while they are being copied over
  • your "delayed backup" does not have the matching config metadata to interpret the state of the cluster (and it will be changing over time, unless you disable the balancer)

Depending on the length of the replication delay (your diagram mentions "hour" and "week"), multiple changes to the cluster metadata may have happened and the config data will be very out of sync.

While you will have a delayed copy of the "data", if it isn't in sync with the sharded cluster metadata your path to a full restore will be reloading (and resharding) the data. You will also have to worry about resolving duplicate documents which can exist due to in-progress migrations.

For more information see the Backup and Restore Sharded Cluster tutorials in the MongoDB documentation. If you have a large amount of data, you will typically want to use the Filesystem snapshot approach to create an approximate point-in-time backup.