You currently have a member running in secondary, because it cannot form a majority. This is why you should always have an odd number of nodes in a replica set (one can be an arbiter) and I would recommend adding a third node as soon as possible once you get things back to normal.
In terms of how to get the second node up and running, do the following:
- Shut down the remaining Secondary server (not strictly necessary, since it is read only as a Secondary, but safer)
- Now, copy the whole data directory (everything in dbpath) over to the other host
- Restart both nodes
One of the advantages to replica sets (over classic master/slave) is that they are intended to be functionally identical to each other, so you can simply use the data from one "good" node to seed any other "bad" node.
Update: April 2018
This answer was correct at the time of the question, but things have moved on since then. Since version 3.4 parallelism has been introduced, and the ticket I referenced originally has been closed. For more information I cover some of the details in this more recent answer. I will leave the rest of the answer as-is because it remains a good reference for general issues/constraints as well as valid for anyone on an older version.
Original Answer
I give a full explanation of what happens with a chunk migration in the M202 Advanced course if you are interested. In general terms, let's just say that migrations are not very fast, even for empty chunks, because of the housekeeping being performed to make sure migrations work in an active system (these still happen even if nothing but balancing is happening).
Additionally, there is only one migration happening at a time on the entire cluster - there is no parallelism. So, despite the fact that you have two "full" nodes and two "empty" nodes, at any given time there is at most one migration happening (between the shard with the most chunks and the shard with the least). Hence, having added 2 shards gains you nothing in terms of balancing speed and just increases the number of chunks which have to be moved.
For the migrations themselves, the chunks are likely ~30MiB in size (depends on how you populated data, but generally this will be your average with the default max chunk size). You can run db.collection.getShardDistribution()
for some of that information, and see my answer here for ways to get even more information about your chunks.
Since there is no other activity going on, for a migration to happen the target shard (one of the newly added shards) will need to read ~30MiB of data from the source shards (one of the original 2) and update the config servers to reflect the new chunk location once it is done. Moving 30MiB of data should not be much of a bottleneck for a normal system without load.
If it is slow, there are a number of possible reasons why that is the case, but the most common for a system that is not busy are:
- Source Disk I/O - if the data is not in active memory when it is read, it must be paged in from disk
- Network - if there is latency, rate limiting, packet loss etc. then the read may take quite a while
- Target Disk I/O - the data and indexes have to be written to disk, lots of indexes can make this worse, but usually this is not a problem on a lightly loaded system
- Problems with migrations causing aborts and failed migrations (issues with config servers, issues with deletes on primaries)
- Replication lag - for migrations to replica sets, write concern
w:2
or w:majority
is used by default and requires up to date secondaries to satisfy it.
If the system was busy then memory contention, lock contention would usually be suspects here too.
To get more information about how long migrations are taking, if they are failing etc., take a look at the entries in your config.changelog
:
// connect to mongos
use config
db.changelog.find()
As you have seen, and as I generally tell people when I do training/education, if you know you will need 4 shards, then it's usually better to start with 4 rather than ramp up. If you do, then you need to be aware that adding a shard can take a long time, and initially is a net negative on resources rather than a gain (see part II of my sharding pitfalls series for a more detailed discussion of that).
Finally, to track/upvote/comment on the feature request to improve the parallelism of chunk migrations, check out SERVER-4355
Best Answer
Normally slow initial sync is because of slow network connection or VERY slow data disks. Have you measured what is transfer speed during initial sync.
Reason, why all data is copied with rsync, is that, that rsync thinks that too much has changed for delta copy.
And, yes option 3 is best solution, but only if you do that copy from snapshot or some other freezed data. Don't do it from live data files what are changing during copy process.