Mongodb – Mongo replica set with no PRIMARY/SECONDARY, members are STARTUP2 and RECOVERING

mongodb

I have a mongo cluster with 6 replica sets. 5 are fine, one is not. Each replica set has three members. Here is the rs.status() for it:

{
    "set" : "rs_5",
    "date" : ISODate("2015-12-16T02:37:39Z"),
    "myState" : 5,
    "members" : [
        {
            "_id" : 0,
            "name" : "mongo_rs_5_member_1:27018",
            "health" : 1,
            "state" : 5,
            "stateStr" : "STARTUP2",
            "uptime" : 33600,
            "optime" : Timestamp(0, 0),
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "lastHeartbeat" : ISODate("2015-12-16T02:37:38Z"),
            "lastHeartbeatRecv" : ISODate("2015-12-16T02:37:37Z"),
            "pingMs" : 0,
            "lastHeartbeatMessage" : "initial sync need a member to be primary or secondary to do our initial sync"
        },
        {
            "_id" : 1,
            "name" : "mongo_rs_5_member_2:27019",
            "health" : 1,
            "state" : 3,
            "stateStr" : "RECOVERING",
            "uptime" : 33842,
            "optime" : Timestamp(1449898728, 18),
            "optimeDate" : ISODate("2015-12-12T05:38:48Z"),
            "lastHeartbeat" : ISODate("2015-12-16T02:37:37Z"),
            "lastHeartbeatRecv" : ISODate("2015-12-16T02:37:37Z"),
            "pingMs" : 3,
            "lastHeartbeatMessage" : "still syncing, not yet to minValid optime 566bb328:3"
        },
        {
            "_id" : 2,
            "name" : "mongo_rs_5_member_3:27020",
            "health" : 1,
            "state" : 5,
            "stateStr" : "STARTUP2",
            "uptime" : 33845,
            "optime" : Timestamp(1449898728, 18),
            "optimeDate" : ISODate("2015-12-12T05:38:48Z"),
            "errmsg" : "still syncing, not yet to minValid optime 566bb327:1",
            "self" : true
        }
    ],
    "ok" : 1
}

In the logs, I see stuff like:

Wed Dec 16 02:40:34.033 [rsMgr] replSet I don't see a primary and I can't elect myself

and

Tue Dec 15 21:41:27.686 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync

Here is rs.conf():

{
    "_id" : "rs_5",
    "version" : 125967,
    "members" : [
        {
            "_id" : 0,
            "host" : "mongo_rs_5_member_1:27018",
            "priority" : 3
        },
        {
            "_id" : 1,
            "host" : "mongo_rs_5_member_2:27019",
            "priority" : 2
        },
        {
            "_id" : 2,
            "host" : "mongo_rs_5_member_3:27020"
        }
    ]
}

It has been like this for a number of days. The cpu and the network are showing no real movement indicating that nothing is happening. Obviously, I'd like to not lose data, what do I need to do to get this back to a healthy PRIMARY/SECONDARY/SECONDARY replica set.

Best Answer

I was able to resolve this by Breaking the Mirror. Essentially, I picked one of the members, turned it off, removed the /data/local* files, turned it on, and did a rs.initiate(). At this point, I was a replica set of 1 (myself) and primary (obviously). Then, for the other two guys, I turned them off, wiped their entire /data/* files and turned them back on. From the original primary member, I simply added added the two new guys with rs.add("mongo_rs_5_member_1:27018") and rs.add("mongo_rs_5_member_2:27019"). Then the primary synced all the content to the other guys (many hours) and the replica set was health. No more errors in the relevant application.