Mongodb – Mongo DB Replica set Stuck at RECOVERING state

mongodbreplicationrestore

We have created a replica set and now the problem is 2 members of replica set [3 members set] are in recovering mode from 48 hours. Initially the size of recovering nodes was increasing and now even that has stopped. So in recovering nodes they are stuck after 90 GB of data with 60+ GB of local data.

How to come out of this mode ?

Best Answer

The easy, albeit a bit unsecure way

  1. Stop the first secondary
  2. Delete the content of it's dbpath
  3. Restart the secondary
  4. Wait for it to catch up with the primary
  5. Repeat process with the second secondary

This is a bit unsecure as it is unknown why the secondaries entered the Recovering state.

The more secure, but also more intrusive way

As above, but stop your application during the process. This prevents the possibility that your application is inserting more data than the secondaries are able to replicate. However, the problem may occur during production.

The most secure, but also most intrusive way

  1. Shut down the whole replica set
  2. Remove the content of dbpath on both secondaries
  3. Copy the content of dbpath to both secondaries' dbpath
  4. Start the old primary.
  5. Start one of the old secondaries.
  6. Wait until a new primary is elected.
  7. Start the remaining secondary.

Some notes:

Use MMS. It's free, it's easy to set up and it gives you good information about your replica set. Try to keep the value for "replication lag" around 0, and take all means necessary that your replication lag is never greater than the "replication oplog window".

Always make sure you have a 1Gb network and a (sorry) shitload of RAM. The more, the better. Additional rule of thumb: rather half the RAM and SSDs than double the RAM and no SSDs (with RAM remaining within reasonable limits).

Disclaimer: Always make a backup of production data before fiddling with it.