MongoDB – Replica sets – Syncing and DC issue

mongodbreplication

I have 3 instances of MongoDB running on my local machine. One is native and the other two are on a Virtualbox Windows 7 VM.

LocalMongo = My local mongo.
VM1a and VM1b = My two instances of mongo running in my VM.

I was able to correctly setup a replica set within these and verified that they were operating correctly and able to sync to each other properly before I proceeded forward. This is my test setup for a failure within this system:

Step 1: Start
LocalMongo (offline – the "original primary")
VM1a (online – new primary)
VM1b (online – secondary)

Step 2: Import data
LocalMongo (offline)
VM1a (online – push 1,000 documents into the collection "test")
VM1b (online)

Step 3: Reconnect "original primary" and sync to current primary
LocalMongo (online – secondary during the sync)
VM1a (online)
VM1b (online)

Step 4: Swap the "original primary" back into a primary
LocalMongo (online – primary)
VM1a (online – secondary)
VM1b (online – secondary)

Ok so the two things I noticed which seem like odd behavior to me are the following:

1) Right as the swap happens where LocalMongo becomes the primary again I notice that my connection to all of my MongoDB instances drop for 2-3 seconds. When I try and run rs.status() I get:

2014-09-17T14:56:15.632-0400 Socket recv() errno:10053 An established connection was aborted by the software in your host machine. 198.162.55.101:27017
2014-09-17T14:56:15.647-0400 SocketException: remote: 198.162.55.101:27017 error: 9001 socket exception [RECV_ERROR] server [198.162.55.101:27017]
2014-09-17T14:56:15.647-0400 DBClientCursor::init call() failed
2014-09-17T14:56:15.647-0400 Error: error doing query: failed at src/mongo/shell/query.js:81
2014-09-17T14:56:15.663-0400 trying reconnect to 198.162.55.101:27017 (198.162.55.101) failed
2014-09-17T14:56:15.679-0400 reconnect 198.162.55.101:27017 (198.162.55.101) ok

Is this normal behavior during the transition where they have to take themselves offline?

2) I noticed that after my LocalMongo became the primary again, VM1b (always stays secondary) points to VM1a as it's syncing target and not the newly elected primary. This seems bad to me but I'm still fairly new to Mongo so I don't know if this is how the algorithm works when determining syncing targets. Since both VM1a and VM1b exist within the VM is Mongo just determining that it's the "path of least resistance" and that's why it's a sync target?

{
    "_id" : 0,
    "name" : "198.162.55.1:27017",
    "health" : 1,
    "state" : 1,
    "stateStr" : "PRIMARY",
    "uptime" : 70,
    "optime" : Timestamp(1410979849, 637),
    "optimeDate" : ISODate("2014-09-17T18:50:49Z"),
    "lastHeartbeat" : ISODate("2014-09-17T18:56:59Z"),
    "lastHeartbeatRecv" : ISODate("2014-09-17T18:56:59Z"),
    "pingMs" : 1,
    "electionTime" : Timestamp(1410980169, 1),
    "electionDate" : ISODate("2014-09-17T18:56:09Z")
},
{
    "_id" : 1,
    "name" : "198.162.55.101:27017",
    "health" : 1,
    "state" : 2,
    "stateStr" : "SECONDARY",
    "uptime" : 4227,
    "optime" : Timestamp(1410979849, 637),
    "optimeDate" : ISODate("2014-09-17T18:50:49Z"),
    "infoMessage" : "syncing to: 198.162.55.1:27017",
    "self" : true
}
{
    "_id" : 2,
    "name" : "198.162.55.101:27018",
    "health" : 1,
    "state" : 2,
    "stateStr" : "SECONDARY",
    "uptime" : 4227,
    "optime" : Timestamp(1410979849, 637),
    "optimeDate" : ISODate("2014-09-17T18:50:49Z"),
    "lastHeartbeat" : ISODate("2014-09-17T18:56:59Z"),
    "lastHeartbeatRecv" : ISODate("2014-09-17T18:57:01Z"),
    "pingMs" : 1,
    "lastHeartbeatMessage" : "syncing to: 198.162.55.101:270

    "syncingTo" : "198.162.55.101:27017"
}

My only concern in this situation is what happens if VM1a goes down right after LocalMongo comes up? Wouldn't that mean that the only place where data is actually being stored would be on the primary and that it would create a broken link where VM1b (which never went down) wouldn't sync to anything?

Thanks!

Best Answer

Is this normal behavior during the transition where they have to take themselves offline?

It is expected that your replica set instances will drop network connections when there is a change in primary.

My only concern in this situation is what happens if VM1a goes down right after LocalMongo comes up? Wouldn't that mean that the only place where data is actually being stored would be on the primary and that it would create a broken link where VM1b (which never went down) wouldn't sync to anything?

Replica set secondaries can sync off of other secondaries, which can help reduce load on your primary. If your secondaries sync source goes down, it will reevaluate and change sync source to an available member.