MongoDB replica has no primary

clusteringmongodbreplication

I have an existing MongoDB setup (part of a Graylog installation) where I'm attempting to configure a replica set with a primary, secondary and arbiter. I had the primary and arbiter configured successfully but then for some reason my primary has become the secondary and the application is unable to start now and also isn't allowing me to add in any other nodes in the replica set.

my_mongodb_0:SECONDARY> rs.status()
{
    "set" : "my_mongodb_0",
    "date" : ISODate("2019-01-02T19:50:19.291Z"),
    "myState" : 2,
    "term" : NumberLong(12),
    "heartbeatIntervalMillis" : NumberLong(2000),
    "members" : [
        {
            "_id" : 1,
            "name" : "node2:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 9182,
            "optime" : {
                "ts" : Timestamp(1545938968, 1),
                "t" : NumberLong(12)
            },
            "optimeDate" : ISODate("2018-12-27T19:29:28Z"),
            "configVersion" : 166709,
            "self" : true
        },
        {
            "_id" : 2,
            "name" : "arbiter1:27017",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "lastHeartbeat" : ISODate("2019-01-02T19:50:18.868Z"),
            "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "Connection refused",
            "configVersion" : -1
        }
    ],
    "ok" : 1
}

my_mongodb_0:SECONDARY> rs.add( { host: "node1:27017", force: true } )
{
    "ok" : 0,
    "errmsg" : "replSetReconfig should only be run on PRIMARY, but my state is SECONDARY; use the \"force\" argument to override",
    "code" : 10107
}

Ideally I'd like to add the other node but with the state that it's in now not sure if I can still do this or if I have to revert it to a stand-alone setup first and configure it again. I know the primary election can't occur now and I haven't found a way to force the primary but just wanted to see if anyone else had any ideas.

I'm guessing now it matters what order you add the DB servers and the arbiter but I haven't been able to find a definitive answer to this…

Best Answer

I had the primary and arbiter configured successfully but then for some reason my primary has become the secondary

The issue is that electing or maintaining a primary requires a strict majority (n/2+1) from your configured voting replica set members. A replica set with only two voting members does not provide any fault tolerance for a primary since votes from both members are required to achieve a strict majority.

Since your arbiter is currently "not reachable/healthy" from node2's perspective, node2 no longer has enough votes to remain primary and will step down to become a secondary.

The best way to resolve this would be to fix the communication issue between node2 and your arbiter before adding a third member.

Forced reconfiguration

rs.add( { host: "node1:27017", force: true } )

The force option is only available with rs.reconfig(), so would be ignored in this usage example. If you pass a document as the host parameter for rs.add(host), any options (such as priority or tags) only apply to that individual replica set member.

If you can't get your arbiter back online for some reason, you could follow the forced reconfiguration tutorial to remove the arbiter from your replica set config.

As noted in the tutorial, forced reconfiguration is intended to be used as a last resort rather than routine admin:

The `force` option forces a new configuration onto the member. Use this procedure only
to recover from catastrophic interruptions. Do not use `force` every time you reconfigure.
Also, do not use the `force` option in any automatic scripts and do not use `force`
when there is still a primary.

I'm guessing now it matters what order you add the DB servers and the arbiter

The order isn't significant aside from adding an initial data-bearing node so the replica set has a primary. However, you do need to ensure you have a majority of voting members online and able to communicate with each other.