Mongodb – Reliability of mongodump

mongodbmongodump

I understand that the recommended way to backup MongoDB data is to use file system level tools. However, mongodump utility is promoted as a valid alternative for smaller instances, when file system snapshots are not available for some reason. Many resources recommend running it with –oplog option to enable point-in-time restores.

I tried to understand how –oplog could enable point-in-time snapshots without proper multi-version concurrency control from the database engine. I think it can only work in the following way:

We assume that _id values in all collections are monotonously increasing
Mongodump blocks all writes and records last operation id in the oplog
Mongodump starts dumping documents from all collections by navigating _id index in decreasing order
After dump has started, mongodump unlocks writes to the collections
We assume that all inserts from now on will have higher _id values and will appear only in oplog dump, not collections dump.

This system quickly breaks if _id values are not monotonously increasing or if write locks are not implemented and there is a race condition. Also, there is no support for secondary unique indexes.

From what I see, the following statements are true, meaning that mongodump cannot possibly enable point-in-time restores, so it's specification is misleading as it comes to its snapshot-like capabilities:

Even build-in _id values generators do not guarantee total order
mongodump does not issue any write locks
mongodump does not traverse _id index in descending order, so inserted records will appear both in collection dump and oplog dump.

Is my understanding correct? Is there any value in –oplog option of mongodump utility?

Following are the tickets which seem to enforce my suggestions:

Best Answer

Is my understanding correct? Is there any value in --oplog option of mongodump utility?

As MongoDB Blog documentation here Ensures that mongodump creates a dump of the database that includes a partial oplog containing operations from the duration of the mongodump operation. This oplog produces an effective point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay. Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.

--oplog has no effect when running mongodump against a mongos instance to dump the entire contents of a sharded cluster. However, you can use --oplog to dump individual shards.

--oplog only works against nodes that maintain an oplog. This includes all members of a replica set, as well as master nodes in master/slave replication deployments.

--oplog does not dump the oplog collection.

I understand that the recommended way to backup MongoDB data is to use file system level tools. However, mongodump utility is promoted as a valid alternative for smaller instances, when file system snapshots are not available for some reason. Many resources recommend running it with --oplog option to enable point-in-time restores.

You could use oplog option if you are taking backups of all the DBs.

mongodump  --oplog --out /dump_location

Now while restoring do something like this :

mongorestore --oplogReplay

This way mongodump won't require a write lock. Any writes that are made during the backup process will be written to oplog.bson. While restoring, mongorestore will use this oplog to include the write operations.

For your further ref here, here and here

Related Solutions

Mongodb – Backing up a MongoDB cluster with mongodump

Upgrading to MongoDB 3.4 was necessary, since the cluster string did not appear to work with 3.2.

Also, based on the blog post for 'importing data', a small tweak to the host parameter seemed necessary, in the form of adding the 'Cluster0-shard-0/' prefix :

mongodump --host "Cluster0-shard-0/cluster0-shard-00-00-2djno.mongodb.net:27017,cluster0-shard-00-01-2djno.mongodb.net:27017,cluster0-shard-00-02-2djno.mongodb.net:27017" \
  --authenticationDatabase admin --ssl --username admin \
  --password mypassword --oplog -o /tmp

Edit: updated the response based on @jjussi's feedback. Also, we aren't trying to avoid using the mongo cloud backup, which is more convenient, it is that we just feel better having our own local copy, for various reasons.

Mongodb – difference between mongodump and mongoexport

mongodump generates binary copies of data; it creates better, more efficient backups.

mongoexport can create JSON files; these can be used by other programs, and are basically human-readable as is.

Best Answer

Related Solutions

Mongodb – Backing up a MongoDB cluster with mongodump

Mongodb – difference between mongodump and mongoexport

Related Question