Mongodb – Replica Set Question – one primary instance with two secondary instances

database-designmongodbreplication

Running on AWS ubuntu 16.04 and the primary is a local ubuntu 16.04 on local vm.

When watching the log, I noticed that the primary will switch to secondary whenever a secondary comes online to let the secondary sync, then it will switch back to primary.

is there a way to stop that? This seems worrisome as I am already setting priority to 0 on secondary instances. I need primary to be write to access at all times – I can't have it switch to secondary in case one of the secondary fails and has to reconnect.

Here is what the config looks like:

rs.config()
{
  "_id" : "dbname",
  "version" : 11,
  "protocolVersion" : NumberLong(1),
  "members" : [
    {
      "_id" : 0,
      "host" : "Primary",
      "arbiterOnly" : false,
      "buildIndexes" : true,
      "hidden" : false,
      "priority" : 1,
      "tags" : {

      },
      "slaveDelay" : NumberLong(0),
      "votes" : 1
    },
    {
      "_id" : 1,
      "host" : "Secondary",
      "arbiterOnly" : false,
      "buildIndexes" : true,
      "hidden" : false,
      "priority" : 0,
      "tags" : {

      },
      "slaveDelay" : NumberLong(0),
      "votes" : 1
    },
    {
      "_id" : 2,
      "host" : "Secondary",
      "arbiterOnly" : false,
      "buildIndexes" : true,
      "hidden" : false,
      "priority" : 0,
      "tags" : {

      },
      "slaveDelay" : NumberLong(0),
      "votes" : 1
    }
  ],
  "settings" : {
    "chainingAllowed" : true,
    "heartbeatIntervalMillis" : 2000,
    "heartbeatTimeoutSecs" : 10,
    "electionTimeoutMillis" : 10000,
    "catchUpTimeoutMillis" : -1,
    "catchUpTakeoverDelayMillis" : 30000,
    "getLastErrorModes" : {

    },
    "getLastErrorDefaults" : {
      "w" : 1,
      "wtimeout" : 0
    },
    "replicaSetId" : ObjectId("someid")
  }
}

also here is the full log from the primary


2018-02-01T09:20:02.984-0800 I CONTROL  [initandlisten] MongoDB starting : pid=19294 port=27017 dbpath=/srv/mongodb/db0 64-bit host=ubuntu-cheftest
2018-02-01T09:20:02.984-0800 I CONTROL  [initandlisten] db version v3.6.2
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten] git version: 489d177dbd0f0420a8ca04d39fd78d0a2c539420
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten] allocator: tcmalloc
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten] modules: none
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten] build environment:
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten]     distmod: ubuntu1604
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten]     distarch: x86_64
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten]     target_arch: x86_64
2018-02-01T09:20:02.985-0800 I CONTROL  [initandlisten] options: { net: { bindIp: "127.0.0.1,192.168.6.39", port: 27017 }, replication: { replSet: "mdblid" }, storage: { dbPath: "/srv/mongodb/db0" } }
2018-02-01T09:20:02.985-0800 I -        [initandlisten] Detected data files in /srv/mongodb/db0 created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-02-01T09:20:02.985-0800 I STORAGE  [initandlisten]
2018-02-01T09:20:02.985-0800 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-02-01T09:20:02.985-0800 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-02-01T09:20:02.985-0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=488M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-02-01T09:20:03.677-0800 I STORAGE  [initandlisten] WiredTiger message [1517505603:677241][19294:0x7f156a0f69c0], txn-recover: Main recovery loop: starting at 71/8704
2018-02-01T09:20:03.882-0800 I STORAGE  [initandlisten] WiredTiger message [1517505603:882098][19294:0x7f156a0f69c0], txn-recover: Recovering log 71 through 72
2018-02-01T09:20:03.991-0800 I STORAGE  [initandlisten] WiredTiger message [1517505603:991545][19294:0x7f156a0f69c0], txn-recover: Recovering log 72 through 72
2018-02-01T09:20:04.136-0800 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2018-02-01T09:20:04.136-0800 I STORAGE  [initandlisten] The size storer reports that the oplog contains 4572 records totaling to 503638 bytes
2018-02-01T09:20:04.136-0800 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2018-02-01T09:20:04.149-0800 I CONTROL  [initandlisten]
2018-02-01T09:20:04.149-0800 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten]
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten]
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten]
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-02-01T09:20:04.150-0800 I CONTROL  [initandlisten]
2018-02-01T09:20:04.205-0800 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/srv/mongodb/db0/diagnostic.data'
2018-02-01T09:20:04.208-0800 I REPL     [initandlisten] No oplog entries to apply for recovery. appliedThrough and checkpointTimestamp are both null.
2018-02-01T09:20:04.209-0800 I NETWORK  [initandlisten] waiting for connections on port 27017
2018-02-01T09:20:04.239-0800 W NETWORK  [replexec-0] Failed to connect to 172.31.28.232:27017, in(checking socket for error after poll), reason: Connection refused
2018-02-01T09:20:04.269-0800 W NETWORK  [replexec-0] Failed to connect to 172.31.24.194:27017, in(checking socket for error after poll), reason: Connection refused
2018-02-01T09:20:04.269-0800 I REPL     [replexec-0] New replica set config in use: { _id: "mdblid", version: 11, protocolVersion: 1, members: [ { _id: 0, host: "192.168.6.39:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongo3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "mongo2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5a68d17b6bae4407d0183a21') } }
2018-02-01T09:20:04.269-0800 I REPL     [replexec-0] This node is 192.168.6.39:27017 in the config
2018-02-01T09:20:04.269-0800 I REPL     [replexec-0] transition to STARTUP2 from STARTUP
2018-02-01T09:20:04.269-0800 I REPL     [replexec-0] Starting replication storage threads
2018-02-01T09:20:04.270-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo2:27017
2018-02-01T09:20:04.273-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo3:27017
2018-02-01T09:20:04.274-0800 I REPL     [replexec-0] transition to RECOVERING from STARTUP2
2018-02-01T09:20:04.274-0800 I REPL     [replexec-0] Starting replication fetcher thread
2018-02-01T09:20:04.274-0800 I REPL     [replexec-0] Starting replication applier thread
2018-02-01T09:20:04.274-0800 I REPL     [replexec-0] Starting replication reporter thread
2018-02-01T09:20:04.278-0800 I REPL     [rsSync] transition to SECONDARY from RECOVERING
2018-02-01T09:20:04.303-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to mongo2:27017 - HostUnreachable: Connection refused
2018-02-01T09:20:04.303-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to mongo2:27017 due to failed operation on a connection
2018-02-01T09:20:04.304-0800 I REPL_HB  [replexec-1] Error in heartbeat (requestId: 1) to mongo2:27017, response status: HostUnreachable: Connection refused
2018-02-01T09:20:04.304-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to mongo3:27017 - HostUnreachable: Connection refused
2018-02-01T09:20:04.304-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to mongo3:27017 due to failed operation on a connection
2018-02-01T09:20:04.304-0800 I REPL_HB  [replexec-0] Error in heartbeat (requestId: 3) to mongo3:27017, response status: HostUnreachable: Connection refused
2018-02-01T09:20:04.304-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo2:27017
2018-02-01T09:20:04.304-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo3:27017
2018-02-01T09:20:04.334-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to mongo2:27017 - HostUnreachable: Connection refused
2018-02-01T09:20:04.334-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to mongo2:27017 due to failed operation on a connection
2018-02-01T09:20:04.334-0800 I REPL_HB  [replexec-1] Error in heartbeat (requestId: 5) to mongo2:27017, response status: HostUnreachable: Connection refused
2018-02-01T09:20:04.334-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to mongo3:27017 - HostUnreachable: Connection refused
2018-02-01T09:20:04.334-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to mongo3:27017 due to failed operation on a connection
2018-02-01T09:20:04.334-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo2:27017
2018-02-01T09:20:04.334-0800 I REPL_HB  [replexec-2] Error in heartbeat (requestId: 7) to mongo3:27017, response status: HostUnreachable: Connection refused
2018-02-01T09:20:04.335-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo3:27017
2018-02-01T09:20:04.365-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to mongo3:27017 - HostUnreachable: Connection refused
2018-02-01T09:20:04.365-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to mongo3:27017 due to failed operation on a connection
2018-02-01T09:20:04.365-0800 I REPL_HB  [replexec-0] Error in heartbeat (requestId: 11) to mongo3:27017, response status: HostUnreachable: Connection refused
2018-02-01T09:20:04.365-0800 I REPL     [replexec-0] Member mongo3:27017 is now in state RS_DOWN
2018-02-01T09:20:04.365-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Failed to connect to mongo2:27017 - HostUnreachable: Connection refused
2018-02-01T09:20:04.365-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Dropping all pooled connections to mongo2:27017 due to failed operation on a connection
2018-02-01T09:20:04.365-0800 I REPL_HB  [replexec-2] Error in heartbeat (requestId: 9) to mongo2:27017, response status: HostUnreachable: Connection refused
2018-02-01T09:20:04.366-0800 I REPL     [replexec-2] Member mongo2:27017 is now in state RS_DOWN
2018-02-01T09:20:05.656-0800 I NETWORK  [listener] connection accepted from 172.31.28.232:56742 #1 (1 connection now open)
2018-02-01T09:20:05.686-0800 I NETWORK  [conn1] end connection 172.31.28.232:56742 (0 connections now open)
2018-02-01T09:20:05.718-0800 I NETWORK  [listener] connection accepted from 172.31.28.232:56746 #2 (1 connection now open)
2018-02-01T09:20:05.719-0800 I NETWORK  [conn2] received client metadata from 172.31.28.232:56746 conn: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.6.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-02-01T09:20:07.782-0800 I NETWORK  [listener] connection accepted from 172.31.24.194:39638 #3 (2 connections now open)
2018-02-01T09:20:07.812-0800 I NETWORK  [conn3] end connection 172.31.24.194:39638 (1 connection now open)
2018-02-01T09:20:07.846-0800 I NETWORK  [listener] connection accepted from 172.31.24.194:39642 #4 (2 connections now open)
2018-02-01T09:20:07.847-0800 I NETWORK  [conn4] received client metadata from 172.31.24.194:39642 conn: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.6.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-02-01T09:20:09.365-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo2:27017
2018-02-01T09:20:09.365-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo3:27017
2018-02-01T09:20:09.426-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo2:27017, took 61ms (1 connections now open to mongo2:27017)
2018-02-01T09:20:09.427-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo3:27017, took 62ms (1 connections now open to mongo3:27017)
2018-02-01T09:20:09.456-0800 I REPL     [replexec-2] Member mongo3:27017 is now in state SECONDARY
2018-02-01T09:20:09.459-0800 I REPL     [replexec-1] Member mongo2:27017 is now in state SECONDARY
2018-02-01T09:20:15.302-0800 I REPL     [replexec-1] Starting an election, since we've seen no PRIMARY in the past 10000ms
2018-02-01T09:20:15.302-0800 I REPL     [replexec-1] conducting a dry run election to see if we could be elected. current term: 41
2018-02-01T09:20:15.332-0800 I REPL     [replexec-0] VoteRequester(term 41 dry run) received a yes vote from mongo3:27017; response message: { term: 41, voteGranted: true, reason: "", ok: 1.0, operationTime: Timestamp(1517505592, 1), $clusterTime: { clusterTime: Timestamp(1517505592, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } } }
2018-02-01T09:20:15.332-0800 I REPL     [replexec-0] dry election run succeeded, running for election in term 42
2018-02-01T09:20:15.333-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Ending connection to host mongo2:27017 due to bad connection status; 0 connections to that host remain open
2018-02-01T09:20:15.334-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo2:27017
2018-02-01T09:20:15.365-0800 I REPL     [replexec-1] VoteRequester(term 42) received a yes vote from mongo3:27017; response message: { term: 42, voteGranted: true, reason: "", ok: 1.0, operationTime: Timestamp(1517505592, 1), $clusterTime: { clusterTime: Timestamp(1517505592, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } } }
2018-02-01T09:20:15.365-0800 I REPL     [replexec-1] election succeeded, assuming primary role in term 42
2018-02-01T09:20:15.365-0800 I REPL     [replexec-1] transition to PRIMARY from SECONDARY
2018-02-01T09:20:15.366-0800 I REPL     [replexec-1] Entering primary catch-up mode.
2018-02-01T09:20:15.366-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongo2:27017
2018-02-01T09:20:15.395-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo2:27017, took 61ms (2 connections now open to mongo2:27017)
2018-02-01T09:20:15.426-0800 I REPL     [replexec-0] Caught up to the latest optime known via heartbeats after becoming primary.
2018-02-01T09:20:15.426-0800 I REPL     [replexec-0] Exited primary catch-up mode.
2018-02-01T09:20:15.434-0800 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongo2:27017, took 67ms (2 connections now open to mongo2:27017)
2018-02-01T09:20:17.282-0800 I REPL     [rsSync] transition to primary complete; database writes are now permitted
2018-02-01T09:20:18.849-0800 I NETWORK  [listener] connection accepted from 172.31.24.194:39646 #5 (3 connections now open)
2018-02-01T09:20:18.850-0800 I NETWORK  [conn5] received client metadata from 172.31.24.194:39646 conn: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.6.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
2018-02-01T09:20:18.970-0800 I NETWORK  [listener] connection accepted from 172.31.24.194:39648 #6 (4 connections now open)
2018-02-01T09:20:18.971-0800 I NETWORK  [conn6] received client metadata from 172.31.24.194:39648 conn: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.6.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }

Best Answer

Given your current replica set configuration, this behaviour is expected.

In order to be elected (and remain) primary, a replica set member must have connectivity with a majority of voting replica set members.

Your configuration has two members which have priority 0 and cannot be elected as primary, so there is only one candidate for primary (the member _id:0 ). However, all of the members in your replica set are voting so the required majority to elect a primary is 2/3 configured replica set members.

Based on the log information you have provided from the member you prefer as primary, the replica set member state timeline looks like:

  • Member with _id: 0 is started and performs WiredTiger consistency checks for the data files. The mongod transitions from replica set STARTUP (reading configuration) to STARTUP2 (enabling replication threads).
  • STARTUP2 completes and the member briefly transitions to RECOVERING and then SECONDARY state, where it is now ready to connect to other members of the replica set based on the last saved replica set configuration.
  • Network connectivity to the other members is not available (Connection refused), so the member transitions into a SECONDARY state and marks the other members of the replica set as unreachable: Member mongo2:27017 is now in state RS_DOWN. Since there is only 1 voting member available out of the 3 configured, a primary cannot be elected so this member will remain SECONDARY and will not propose an election.
  • When network connectivity issues are resolved, the status of other members of the replica set can now be determined: Member mongo2:27017 is now in state SECONDARY.
  • The replica set now has connectivity between enough members to start an election. The reason for the election is also noted: Starting an election, since we've seen no PRIMARY in the past 10000ms.
  • The election succeeds (election succeeded, assuming primary role in term 42) and this member transitions from SECONDARY to PRIMARY state.

I need primary to be write to access at all times - I can't have it switch to secondary in case one of the secondary fails and has to reconnect.

The default configuration for a replica set is designed for automatic failover: if the current primary becomes isolated from a majority of replica set members, a new primary can be elected to continue write availability.

If you do not want automatic failover, set the votes for your secondaries to 0:

var conf=rs.conf()
conf.members[1].votes = 0
conf.members[2].votes = 0
rs.reconfig(conf)

With only one voting member in the replica set configuration, the primary is now able to elect itself irrespective of the connectivity or health of the other replica set members.

There are a few obvious caveats with this configuration:

  • Since you only have a single voting node, a w:majority write concern is equivalent to w:1 and will only request acknowledgement from the primary.

  • If the primary remains disconnected from the other replica set members for any significant duration, the secondaries may become stale and require a full resync. You can mitigate this by increasing the size of the replication oplog for each of your replica set members in order to provide an appropriate buffer.