MongoDB GridFS Offsite Backups

backupmongodb

We are using MongoDB GridFS to create a document archive. I am creating a sharded cluster at our Primary DataCenter. I need a working backup at our offsite Disaster Recovery DataCenter. This will be a very large database, ~6TB. I will be adding 20k new documents daily. I need to be able to take incremental backups at the end of our batch cycle and apply them to the offsite cluster. We do not want to use replication because we want the clusters to be entirely independent, but essentially identical. We will never delete from the collection. We are running Red Hat.

Does anyone have any suggestions about how to keep the databases in sync?

Best Answer

You can look into tailing the oplogs of the shards yourself, thereby sending the data to a cluster in a remote data center incrementally. Similarly you can look into tools like mongoriver from Stripe or mongo-connector from MongoDB labs if you don't want to do all the work yourself from scratch (but then are relying on a third party keeping that tool up to date or an unsupported labs project). You will still need to wrap these utilities in some logic based on your needs and do some manual data verification to validate date between the two sites, and getting point-in-time snapshots of a cluster is non-trivial.

The only "out of the box", supported, way to do this (that I am aware of) is to use the on-premise backup capabilities of Ops Manager which is a paid offering from MongoDB themselves (full disclosure: I used to work for MongoDB). It's a complicated process to keep a full copy of a sharded cluster in a separate location, so really your choices are to invest your own time and resources, or pay for a product to take care of it for you.