Mongodb replica set No Election of Primary with 6 of 7 Nodes UP

high-availabilitymongodb

I have 8 instances running on Scaleway , 5 for Data and 3 Arbiter.

And the versions of mongo on them are such :

mongodb2  (mongodb 3.4.9)
mongodb4  (mongodb 3.4.15)
mongodb5  (mongodb 3.4.15)
mongodb6  (mongodb 3.4.15)
mongodb-arb1 (mongodb 3.4.18)
mongodb-arb2 (mongodb 3.4.18)
mongodb-arb3 (mongodb 3.4.18)

I need to shutdown mongodb2 instance, so I'm trying to make one of the other DATA members become PRIMARY. I'm trying to stepDown , or shutdown mongodb2 , but nothing works …

Below is the output of rs.stepDown() .

{
        "ok" : 0,
        "errmsg" : "No electable secondaries caught up as of 2018-12-07T16:36:04.546+0300. Please use {force: true} to force node to step down.",
        "code" : 50,
        "codeName" : "ExceededTimeLimit"
}

And when I use force:true , as db.adminCommand( { replSetStepDown: 120, secondaryCatchUpPeriodSecs:15, force: true } ) the output:

QUERY    [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host '127.0.0.1:27017'  :
DB.prototype.runCommand@src/mongo/shell/db.js:132:1
DB.prototype.adminCommand@src/mongo/shell/db.js:150:16
@(shell):1:1
2018-12-07T16:37:29.322+0300 I NETWORK  [thread1] trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed
2018-12-07T16:37:29.324+0300 I NETWORK  [thread1] reconnect 127.0.0.1:27017 (127.0.0.1) ok

and the mongodb2 becomes SECONDARY , but becomes PRIMARY after a few seconds .

I'll provide the configuration below when I just POWER OFF the Primary Node , mongodb2.

[
            {
                    "_id" : 1,
                    "name" : "mongodb2:27017",
                    "health" : 0,
                    "state" : 8,
                    "stateStr" : "(not reachable/healthy)",
                    "uptime" : 0,
                    "optime" : {
                            "ts" : Timestamp(0, 0),
                            "t" : NumberLong(-1)
                    },
                    "optimeDurable" : {
                            "ts" : Timestamp(0, 0),
                            "t" : NumberLong(-1)
                    },
                    "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                    "optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
                    "lastHeartbeat" : ISODate("2018-12-07T13:25:53.187Z"),
                    "lastHeartbeatRecv" : ISODate("2018-12-07T13:25:22.796Z"),
                    "pingMs" : NumberLong(0),
                    "lastHeartbeatMessage" : "Connection refused",
                    "configVersion" : -1
            },
            {
                    "_id" : 3,
                    "name" : "mongodb4:27017",
                    "health" : 1,
                    "state" : 2,
                    "stateStr" : "SECONDARY",
                    "uptime" : 11771707,
                    "optime" : {
                            "ts" : Timestamp(1544189117, 1),
                            "t" : NumberLong(23)
                    },
                    "optimeDate" : ISODate("2018-12-07T13:25:17Z"),
                    "infoMessage" : "could not find member to sync from",
                    "configVersion" : 11,
                    "self" : true
            },
            {
                    "_id" : 4,
                    "name" : "mongodb5:27017",
                    "health" : 1,
                    "state" : 2,
                    "stateStr" : "SECONDARY",
                    "uptime" : 11771703,
                    "optime" : {
                            "ts" : Timestamp(1544189117, 1),
                            "t" : NumberLong(23)
                    },
                    "optimeDurable" : {
                            "ts" : Timestamp(1544189117, 1),
                            "t" : NumberLong(23)
                    },
                    "optimeDate" : ISODate("2018-12-07T13:25:17Z"),
                    "optimeDurableDate" : ISODate("2018-12-07T13:25:17Z"),
                    "lastHeartbeat" : ISODate("2018-12-07T13:25:53.161Z"),
                    "lastHeartbeatRecv" : ISODate("2018-12-07T13:25:53.141Z"),
                    "pingMs" : NumberLong(1),
                    "configVersion" : 11
            },
            {
                    "_id" : 5,
                    "name" : "mongodb6:27017",
                    "health" : 1,
                    "state" : 2,
                    "stateStr" : "SECONDARY",
                    "uptime" : 4982,
                    "optime" : {
                            "ts" : Timestamp(1544189117, 1),
                            "t" : NumberLong(23)
                    },
                    "optimeDurable" : {
                            "ts" : Timestamp(1544189117, 1),
                            "t" : NumberLong(23)
                    },
                    "optimeDate" : ISODate("2018-12-07T13:25:17Z"),
                    "optimeDurableDate" : ISODate("2018-12-07T13:25:17Z"),
                    "lastHeartbeat" : ISODate("2018-12-07T13:25:53.162Z"),
                    "lastHeartbeatRecv" : ISODate("2018-12-07T13:25:53.139Z"),
                    "pingMs" : NumberLong(0),
                    "configVersion" : 11
            },
            {
                    "_id" : 6,
                    "name" : "mongodb-arb:27017",
                    "health" : 1,
                    "state" : 7,
                    "stateStr" : "ARBITER",
                    "uptime" : 5462,
                    "lastHeartbeat" : ISODate("2018-12-07T13:25:53.162Z"),
                    "lastHeartbeatRecv" : ISODate("2018-12-07T13:25:52.730Z"),
                    "pingMs" : NumberLong(1),
                    "configVersion" : 11
            },
            {
                    "_id" : 7,
                    "name" : "mongodb-arb:27018",
                    "health" : 1,
                    "state" : 7,
                    "stateStr" : "ARBITER",
                    "uptime" : 295,
                    "lastHeartbeat" : ISODate("2018-12-07T13:25:53.162Z"),
                    "lastHeartbeatRecv" : ISODate("2018-12-07T13:25:52.726Z"),
                    "pingMs" : NumberLong(0),
                    "configVersion" : 11
            },
            {
                    "_id" : 8,
                    "name" : "mongodb-arb:27019",
                    "health" : 1,
                    "state" : 7,
                    "stateStr" : "ARBITER",
                    "uptime" : 162,
                    "lastHeartbeat" : ISODate("2018-12-07T13:25:53.162Z"),
                    "lastHeartbeatRecv" : ISODate("2018-12-07T13:25:52.773Z"),
                    "pingMs" : NumberLong(1),
                    "configVersion" : 11
            }
    ]

In this state , nothing changes , no matter how long i wait. The healthy DATA members just keep spitting the log :

2018-12-07T13:25:48.166+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongodb2:27017
2018-12-07T13:25:48.167+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to mongodb2:27017 - HostUnreachable: Connection refused
2018-12-07T13:25:48.167+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to mongodb2:27017 due to failed operation on a connection
2018-12-07T13:25:48.167+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to mongodb2:27017; HostUnreachable: Connection refused

and there is no election of a PRIMARY node

Best Answer

Turns out that , my 3 other DATA nodes all had "0" votes , and zero priority. Making them Non-Voting Members . Hence , when mongodb2 is down , there's not enough Voting Members to pick a primary node .

All aside , i don't know why my other Data nodes were given 0 priority and 0 votes , because i did nothing explicit to set them so.