Mongodb – mongodump: are 4 shards with total datasize 575 GB consider a Small Sharded Cluster

backupmongodbmongodb-3.2mongodumprestore

I need to test the restore with Ops Manager. For this I "clone" the production sharded cluster. I create VMs with the same size like production and do mongodump/mongorestore (Ops Manager Deployment). My test (for restore) don't need to be a consistent copy, for me no problem if around 5 GB is missing.

DATA SIZE: 573.6 GB

shard0
142.6 GB

shard1
145.94 GB

shard2
142.55 GB

shard3
142.52 GB

For simplicity reason I wish to take a mongodump and pipe it to mongorestore on the mongos.

I found a an old doc (v3.0) Backup a Small Sharded Cluster with mongodump. This documentation doesn't exists anymore in new MongoDB versions.

If your sharded cluster holds a small data set, you can connect to a
mongos using mongodump.

what is a small data set in GB? See above for my deployment.

If you use mongodump without specifying a database or collection,
mongodump will capture collection data and the cluster meta-data from
the config servers.

I don't need to backup explicitly config RS?

When restoring data to sharded cluster, you must deploy and configure
sharding before restoring data from the backup. See Deploy a Sharded
Cluster for more information.

It means in plain English I need to define the shard key (and enable sharding) before restore?

Do I miss any steps/important things?

Best Answer

I found a an old doc (v3.0) Backup a Small Sharded Cluster with mongodump. This documentation doesn't exists anymore in new MongoDB versions.

This procedure is only for backing up the data from a small sharded cluster and does not cover recreating the sharded environment or capturing a point-in-time backup. As you've noticed, there is no mention of backing up the config server data or other essential steps that would be required for a sharded environment (for example, stopping the balancer). This procedure would perhaps be suitable for backing up data from a development or staging environment, but is not recommendable for a typical production environment.

For a more complete sharded backup procedure using mongodump, see: Back Up a Sharded Cluster with Database Dumps. Please ensure the version of the documentation you are referencing matches your MongoDB release series as there may be notable differences.

However, you mentioned using MongoDB Ops Manager which includes a specific feature for backing up sharded clusters. If you choose the option for a manual restore, Ops Manager will provide archive files to restore the config servers and shards. Since Ops Manager licensing is part of a MongoDB Enterprise subscription, I would recommend raising a commercial support case with MongoDB if you need recommendations or clarification on any of the procedures or your requirements.

what is a small data set in GB?

There isn't an absolute number. General factors include resource challenges such as the size of your data relative to RAM, available network bandwidth, and how quickly your data changes. Typically if you have sufficient data or workload to warrant sharding, you've also outgrown mongodump as a backup approach.

mongodump is going to read all data into memory which will have significant impact on working sets for shards if your data is much larger than available RAM. You also need to have enough disk space to save a complete backup (or compressed backup for MongoDB 3.2+) of the data dumped via a single mongos, enough network bandwidth to cope with the increased traffic, etc.

For your specific use case mongodump definitely isn't a recommendable strategy for several strong reasons:

  • this is a production environment
  • you want to clone/recreate the sharded cluster in another environment
  • you have access to MongoDB Ops Manager for backup