Mongodb – In mongodb 3.0 replication, how elections happen when a secondary goes down

mongodbmongodb-3.0replication

Situation: I have a MongoDB replication set over two computers.

  • One computer is a server that holds the primary node and the arbiter. This server is a live server and is always on. It's local IP that is used in replication is 192.168.0.4.
  • Second is a PC that the secondary node resides on and is on for a few hours a day. It's local IP that is used in replication is 192.168.0.5.

My expectation: I wanted the live server to be the main point of data interaction of my application, regardless of the state of the PC (whether it is reachable or not, since PC is secondary), so I wanted to make sure that server's node is always primary.

The following is the result of rs.config():

liveSet:PRIMARY> rs.config()
{
    "_id" : "liveSet",
    "version" : 2,
    "members" : [
        {
            "_id" : 0,
            "host" : "192.168.0.4:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 10,
            "tags" : {

            },
            "slaveDelay" : 0,
            "votes" : 1
        },
        {
            "_id" : 1,
            "host" : "192.168.0.5:5051",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : 0,
            "votes" : 1
        },
        {
            "_id" : 2,
            "host" : "192.168.0.4:5052",
            "arbiterOnly" : true,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : 0,
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatTimeoutSecs" : 10,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        }
    }
}

Also I have set the storage engine to be WiredTiger, if that matters.

What I actually get, and the problem: When I turn off the PC, or kill its mongod process, then the node on the server becomes secondary.

The following is the output of the server when I killed PC's mongod process, while connected to primary node's shell:

liveSet:PRIMARY>
2015-11-29T10:46:29.471+0430 I NETWORK  Socket recv() errno:10053 An established connection was aborted by the software in your host machine. 127.0.0.1:27017
2015-11-29T10:46:29.473+0430 I NETWORK  SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [RECV_ERROR] server [127.0.0.1:27017]
2015-11-29T10:46:29.475+0430 I NETWORK  DBClientCursor::init call() failed
2015-11-29T10:46:29.479+0430 I NETWORK  trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed
2015-11-29T10:46:29.481+0430 I NETWORK  reconnect 127.0.0.1:27017 (127.0.0.1) ok
liveSet:SECONDARY>


There are two doubts for me:

  1. Considering this part of MongoDB documentation:

Replica sets use elections to determine which set member will become primary. Elections occur after initiating a replica set, and also any time the primary becomes unavailable.

The election occurs when the primary is not available (or at the time of initiating, however this is part does not concern our case), but primary was always available, so why the election happens.

  1. Considering this part of the same documentation:

If a majority of the replica set is inaccessible or unavailable, the replica set cannot accept writes and all remaining members become read-only.

Considering the part 'members become read-only', I have two nodes up vs one down, so this should not also affect our replication.

Now my question: How to keep the node on the server as primary, when the node on PC is not reachable?

Update:
This is the output of rs.status().

Now this makes the behavior obvious, since arbiter was not reachable.

liveSet:PRIMARY> rs.status()
{
    "set" : "liveSet",
    "date" : ISODate("2015-11-30T04:33:03.864Z"),
    "myState" : 1,
    "members" : [
        {
            "_id" : 0,
            "name" : "192.168.0.4:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 1807553,
            "optime" : Timestamp(1448796026, 1),
            "optimeDate" : ISODate("2015-11-29T11:20:26Z"),
            "electionTime" : Timestamp(1448857488, 1),
            "electionDate" : ISODate("2015-11-30T04:24:48Z"),
            "configVersion" : 2,
            "self" : true
        },
        {
            "_id" : 1,
            "name" : "192.168.0.5:5051",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 496,
            "optime" : Timestamp(1448796026, 1),
            "optimeDate" : ISODate("2015-11-29T11:20:26Z"),
            "lastHeartbeat" : ISODate("2015-11-30T04:33:03.708Z"),
            "lastHeartbeatRecv" : ISODate("2015-11-30T04:33:02.451Z"),
            "pingMs" : 1,
            "configVersion" : 2
        },
        {
            "_id" : 2,
            "name" : "192.168.0.4:5052",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "lastHeartbeat" : ISODate("2015-11-30T04:33:00.008Z"),
            "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
            "configVersion" : -1
        }
    ],
    "ok" : 1
}
liveSet:PRIMARY>

Best Answer

What happened

I have asked in the comments of the question that OP provides the output of rs.status()

The reason for that was that the primary reverted to secondary status once a single member was shut down. It was obvious that the cluster lost the quorum necessary to elect a new primary. This could only be the case when one additional of the voting members of the original members of the replica set is or became unavailable.

As it turned out, the arbiter of the replica set in question was not reachable by the primary, which after the shutdown of the PC was the only remaining member (from it's point of view) of the replica set. So it wasn't possible to hold an election with a quorum of the configured replica set members and it consequently reverted to secondary state.

How to prevent

  • Always run rs.status() after setting up a replica set.
  • Always run rs.status() when encountering problems with a replica set
  • Always do fail tests (down to loosing write capabilities) and ensure your application handles those situations gracefully (as OP did)

Using these rules, you will eliminate the vast majority of problems one can face when using a replica set.

Personally, I think MongoDB Inc.'s Cloud Manager is a must for production environments, since it shows such problems os OP had right away and has alerting built in.

Side note

Never, ever (and yes, that means no exception for no reason, however sound the reasons may seem to be) put an arbiter on a data bearing node of the same replica set.

Imagine the node with the arbiter and the data bering node goes down.

If you have a 3 member replica set, you wouldn't have a quorum of the original members, your remaining member would automatically revert to secondary, loosing the failover capability.

In a 5 member replica set, two voting members would be eliminated. Fine as long as all others are up and running, right? Except, it isn't fine. If another node fails, you'd loose your quorum again. So with only two nodes failed, the other two nodes become more or less useless. Given the price of a virtual server today (and even the smallest ones are well sufficient to run an arbiter), this simply does not make sense. You'd be paying 4 data bearing nodes anyway and loose failover capabilities because you tried to save a tiny fraction of the overall costs.

With a 7 member replica set, those costs become an even tinier fraction of the overall costs.

Conclusion: It's simply a bad decision business wise to have a arbiter running on the same machine as a data bearing node, even when setting aside the technical aspects.