Mongodb – resizing mongodb replica-set on aws

awsmongodb

I have a replica-set which consists of 3 amazon ec2 nodes : primary, secondary and an arbiter
the data volume size is ~400GB and currently it is 90% full. i need to resize the volumes to a more appropriate size.
I'm looking for the best solution to do it with minimum downtime/no-downtime and minimum resource overhead – meaning use the least amount of CPU/IO/network needed for the resize process. time is not a big issue as i believe it wouldn't take more than 24 hours.

Here are the logical steps i have come up with so far:

  1. create, attach & mount new volume for secondary
  2. stop mongod process on secondary
  3. copy data from old volume to new volume
  4. restart mongod process on secondary with new dbpath directing to the new volume OR replace mounting points for old and new volumes, and then restart mongod process
  5. ensure secondary is finished syncing the remaining data

===============================================================

  1. create, attach & mount new volume for primary
  2. stepdown primary
  3. ensure all processes bounced correctly to the new primary
  4. repeat 2-5 (on the current secondary)
  5. [optional] stepdown primary – return to original setup

my questions:

a) when copying files from one volume to another, assuming the bigger one is also faster (more PIOPS), is it efficient to use parallelism ? as in copy a maximum of ~10 files at a time?

b) some would say if i already put the effort for copying the data then i should simply wipe the data and let it resync anew – however i'm worried about the network and read impacts on the "living" primary which may degrade performance for my applications. is that a right assumption?

c) how can i measure/verify that the oplog retention is sufficient for resyncing the database after copying the files?

d) is there a better idea/solution ?

mongo version - 3.0.10
storage engine - mmapv1
os - amazon linux
journaling is on

thanks

Best Answer

a) when copying files from one volume to another, assuming the bigger one is also faster (more PIOPS), is it efficient to use parallelism ? as in copy a maximum of ~10 files at a time?

Limiting factors for your transfer are likely to be the speed of reading from your original storage as well as the network bandwidth between old and new storage paths. Starting parallel copies is unlikely to speed up the process, and could potentially slow down the overall transfer if your copy process requires reading multiple files which are randomly located on slow origin storage (eg. non-SSD).

Some EBS volumes (notably those restored from snapshots) may also benefit from initialization or prewarming to achieve maximum performance after being attached. See: Initializing Amazon EBS Volumes in the EC2 User Guide.

b) some would say if i already put the effort for copying the data then i should simply wipe the data and let it resync anew - however i'm worried about the network and read impacts on the "living" primary which may degrade performance for my applications. is that a right assumption?

With MMAPv1 there are some possible benefits to copying the data files vs a resync:

  • You could potentially compress the data files before transferring, which may save transfer time if the files compress significantly.
  • The new data volume could be usable faster, as data files and indexes will already be built.

Some possible benefits of resyncing are:

  • If you have excessive fragmentation or storage usage, the actual data to transfer may be less than the file size on disk.
  • This saves some overhead of transferring index data but has the expense of rebuilding indexes on the destination node.
  • You could change the storage engine to WiredTiger, which includes on-disk compression. If your main concern is disk space this might alleviate some of your resource pressure. Since this is a more significant change than simply increasing storage space, you definitely want to test against your application in a staging/UAT environment.

c) how can i measure/verify that the oplog retention is sufficient for resyncing the database after copying the files?

You could estimate how long the file transfer will take and ensure the oplog is a reasonable multiple of the worst case file transfer time. I strongly recommend avoiding any approach which leaves you without a viable secondary while you are copying/syncing data; racing against the oplog duration is risky if something goes amiss in the copy/sync process and it takes much longer than you planned for.

d) is there a better idea/solution ?

You currently have a Primary/Secondary/Arbiter configuration. Instead of compromising replication by stopping your only secondary in order to copy the files, I recommend adding a new secondary with increased storage and dropping the arbiter (since it won't be needed if you have an odd number of voting nodes).

Once your new secondary completes initial sync you can then upgrade and resync the other secondary, and finally step down the primary. At this stage you could either drop the former primary and add an arbiter to return to your Primary/Secondary/Arbiter config, or consider adding another secondary to the replica set so you have a more robust Primary/Secondary/Secondary deployment.

For a critical production environment I would encourage you to use three data-bearing replica set members instead of two plus an arbiter. A main consideration when using a three node configuration with an arbiter is that if one of your data-bearing nodes is unavailable, you no longer have replication or data redundancy.