MongoDB Backup – Large mongodump Followed by mongorestore

backupmongodbrestore

I have a fairly large (~240GB) Mongo database that I need to transfer across a sluggish network and onto a new server.

Traditionally for these situations, I've found that a mongodump followed by a mongorestore has been much faster than a db.cloneCollection() method. However, I realized today that doing a full mongodump followed by a mongorestore is a bit wasteful (I think) since I do all of the data transfer, then do all of the insertions.

I would prefer to transfer data from the old mongo (the mongodump step) while simultaneously inserting available data into the new database (the mongorestore step).

Does anyone know how to parallelize the dumping and inserting process in MongoDB? (And would this actually be faster?)

Best Answer

First of all, you can use a pipe

mongodump -h sourceHost -d yourDatabase … | mongorestore -h targetHost -d yourDatabase …

This reduces the time, as each document read will more or less instantly be restored on targetHost.

However, this has the disadvantage that you might run into problems if the procedure is aborted for some reason, for example of a network failure. As for parallelization, you could do the above for each collection, but I doubt that you will have any performance gain, as the limiting factor most likely is IO, and even if not, the concurrency will most likely be a killer.

What I would do is to create a temporary replica set, consisting of the old server, the new server and an arbiter. The initial sync is rather fast, and even if you run into network problems, the sync mechanism will make sure that everything is fine. After the initial sync is done, you simply have the old server step down as primary and restart the new server without the replSetName option, making it a standalone again. Now you can connect to the new server and all data is transferred.

The advantage is that this works with minimal downtime and no attendance. After you initialized the replica set, your application can still use the old server, and even new data will be automatically transferred to the new server. So you do not have to sit besides it – and we all know that the process is likely to finish at 3am, regardless of the time zone. ;) Any time after the initial sync is finished (even hours or days later), you can make the new server a standalone, and change your application's connection string to the new server. This is a matter of 2 or 3 minutes, if planned properly.

Edit

This method has a drawback, however: You can only upgrade from one release to one release higher. So migrating from 2.4 (archeology as of the time of this writing) to 3.4 (cutting edge), you'd have to repeat the process multiple times:

  • From 2.4 to 2.6
  • From 2.6 to 3.0
  • From 3.0 to 3.2
  • From 3.2 to 3.4