Question:
I have a shard mongo cluster and I want to migrate data from this cluster to a one node mongo instance. So I want to know is it possible to use dump/restore?
I use
pymongo
andDjango
andpython3
andmongo3.4
If the answer is No
, So what is the best way to migrate data from a mongo cluster to a single node mongo instance?
I tried to write a script to migrate the data:
# fetch data from old cluster
data = MongoCollection._get_collection().find(query)
# insert data to new single mongo instance
NewMongoCollection.insert(data)
What I did had a lot of issues such as:
- Fetch whole data and store data in RAM
- If something goes wrong, I have to start it again
- …
Best Answer
It's definitely possible to
mongodump
from a sharded cluster andmongorestore
to another deployment (standalone, replica set, or sharded cluster).With this approach there are some general considerations to be aware of:
mongodump
data viamongos
for a sharded cluster.mongodump
needs to read all requested data through the memory of themongod
process(es), so this can have a significant resource and performance impact for the deployment being backed up (particularly if the uncompressed data set is much larger than available RAM).mongorestore
will rebuild all indexes for collections being restored, which can have a significant performance impact on the target cluster. If you are using a version of MongoDB server older than 4.2, the default foreground index builds will block other reads & writes to the destination database. MongoDB 4.2 has less impactful index build process (see: Index Builds on Populated Collections).mongodump
does not take a point-in-time backup, so if the source cluster is being actively updated your backup may not be consistent.mongodump
/mongorestore
should not be used (the lack of point-in-time consistency is particularly problematic in this case).Your original approach of writing a script to migrate the data would also be viable if you iterate rather than trying to store the whole result set in RAM. You could make this approach resumable by scanning a collection in a predictable order (such as sorting by
_id
, assuming this is monotonically increasing if new documents continue to be inserted).Unless you have very custom requirements for data export or throttling, I would be inclined to use
mongodump
andmongorestore
with appropriate options for compression and concurrency (for example parallel collections to dump and number of insertion workers per collection).