Mongodb – resizing mongodb replica-set on aws

awsmongodb

I have a replica-set which consists of 3 amazon ec2 nodes : primary, secondary and an arbiter
the data volume size is ~400GB and currently it is 90% full. i need to resize the volumes to a more appropriate size.
I'm looking for the best solution to do it with minimum downtime/no-downtime and minimum resource overhead – meaning use the least amount of CPU/IO/network needed for the resize process. time is not a big issue as i believe it wouldn't take more than 24 hours.

Here are the logical steps i have come up with so far:

create, attach & mount new volume for secondary
stop mongod process on secondary
copy data from old volume to new volume
restart mongod process on secondary with new dbpath directing to the new volume OR replace mounting points for old and new volumes, and then restart mongod process
ensure secondary is finished syncing the remaining data

===============================================================

create, attach & mount new volume for primary
stepdown primary
ensure all processes bounced correctly to the new primary
repeat 2-5 (on the current secondary)
[optional] stepdown primary – return to original setup

my questions:

a) when copying files from one volume to another, assuming the bigger one is also faster (more PIOPS), is it efficient to use parallelism ? as in copy a maximum of ~10 files at a time?

b) some would say if i already put the effort for copying the data then i should simply wipe the data and let it resync anew – however i'm worried about the network and read impacts on the "living" primary which may degrade performance for my applications. is that a right assumption?

c) how can i measure/verify that the oplog retention is sufficient for resyncing the database after copying the files?

d) is there a better idea/solution ?

mongo version - 3.0.10
storage engine - mmapv1
os - amazon linux
journaling is on

thanks

Best Answer

a) when copying files from one volume to another, assuming the bigger one is also faster (more PIOPS), is it efficient to use parallelism ? as in copy a maximum of ~10 files at a time?

Limiting factors for your transfer are likely to be the speed of reading from your original storage as well as the network bandwidth between old and new storage paths. Starting parallel copies is unlikely to speed up the process, and could potentially slow down the overall transfer if your copy process requires reading multiple files which are randomly located on slow origin storage (eg. non-SSD).

Some EBS volumes (notably those restored from snapshots) may also benefit from initialization or prewarming to achieve maximum performance after being attached. See: Initializing Amazon EBS Volumes in the EC2 User Guide.

b) some would say if i already put the effort for copying the data then i should simply wipe the data and let it resync anew - however i'm worried about the network and read impacts on the "living" primary which may degrade performance for my applications. is that a right assumption?

With MMAPv1 there are some possible benefits to copying the data files vs a resync:

You could potentially compress the data files before transferring, which may save transfer time if the files compress significantly.
The new data volume could be usable faster, as data files and indexes will already be built.

Some possible benefits of resyncing are:

If you have excessive fragmentation or storage usage, the actual data to transfer may be less than the file size on disk.
This saves some overhead of transferring index data but has the expense of rebuilding indexes on the destination node.
You could change the storage engine to WiredTiger, which includes on-disk compression. If your main concern is disk space this might alleviate some of your resource pressure. Since this is a more significant change than simply increasing storage space, you definitely want to test against your application in a staging/UAT environment.

c) how can i measure/verify that the oplog retention is sufficient for resyncing the database after copying the files?

You could estimate how long the file transfer will take and ensure the oplog is a reasonable multiple of the worst case file transfer time. I strongly recommend avoiding any approach which leaves you without a viable secondary while you are copying/syncing data; racing against the oplog duration is risky if something goes amiss in the copy/sync process and it takes much longer than you planned for.

d) is there a better idea/solution ?

You currently have a Primary/Secondary/Arbiter configuration. Instead of compromising replication by stopping your only secondary in order to copy the files, I recommend adding a new secondary with increased storage and dropping the arbiter (since it won't be needed if you have an odd number of voting nodes).

Once your new secondary completes initial sync you can then upgrade and resync the other secondary, and finally step down the primary. At this stage you could either drop the former primary and add an arbiter to return to your Primary/Secondary/Arbiter config, or consider adding another secondary to the replica set so you have a more robust Primary/Secondary/Secondary deployment.

For a critical production environment I would encourage you to use three data-bearing replica set members instead of two plus an arbiter. A main consideration when using a three node configuration with an arbiter is that if one of your data-bearing nodes is unavailable, you no longer have replication or data redundancy.

Related Solutions

Mongodb – Can arbiterOnly replica in MongoDB become SECONDARY and what it means

Some time ago my one of my data replicas which was secondary at a time crushed due to hard drive failure. After I fixed that problem and restarted secondary it went into “Recovering” state. But my arbiter is now “Secondary”

A MongoDB arbiter cannot automatically become a secondary or a primary node, as it does not have a copy of the data set.

If you try to manually reconfigure the arbiter as a regular node via rs.reconfig() you should get an exception similar to:

{
    "errmsg" : "exception: arbiterOnly may not change for members",
    "code" : 13510,
    "ok" : 0
}

Furthermore, the data directory for this Arbiter shows bunch of files of total size >10GB. They indeed look to me like data files. Are they? What is going to happen to these files when Recovery completes?

Assuming this node is definitely an arbiter, I would expect those files are unused (check the timestamps?) and are either:

a local database created if this node was incorrectly initialized with an oplog
unused copy of data directory if this node was copied from another secondary or used as a standalone before

You can always log into the arbiter mongod directly to see what data it appears to have.

Mongodb – No connections to mongodb replset member post outage

By "B" again working do you mean the connections to be distributed evenly when B comes back? Do you see new connections also going to C when B has come back?

I believe MongoS does not do dynamic load balancing w.r.t. to the existing state of connections. However if you create new connections, MongoS may appropriately create connections on B. Do you not see this behaviour?

Best Answer

Related Solutions

Mongodb – Can arbiterOnly replica in MongoDB become SECONDARY and what it means

Mongodb – No connections to mongodb replset member post outage

Related Question