Mongodb – Mongo replication sync to newer version

migrationmongodbreplication

I got one mongo v2.4.10 server with 1.7TB data, I am trying to migrate and upgrade the mongo to mongo v.3.0.15 server

I've setup a new mongo v.3.0.15 and configured replication for v3.0.15 to be secondary to sync with v.2.4.10 primary mongo.

The secondary was in STARTUP2 and the sync was almost finish as I can check with the growth of my storage device for the new machine which running mongo v.3.0.15

However there were some socket exceptions which caused both of my machine to resyn again from the start, just to ask anything I can configure or setup to prevent the error to happen again because I don't want to waste another 7 days to fail to sync up 1.7TB again.

Below are some logs from my mongo:

Primary mongo (v2.4.10):

Wed Jul 3 10:03:59.196 [conn21] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [x.x.x.x:32829]

Secondary mongo (v.3.0.15)


...
2019-07-03T09:54:29.169+0800 I NETWORK [ReplExecNetThread-0] Socket recv() timeout 192.168.168.122:27017
2019-07-03T09:54:29.169+0800 I NETWORK [ReplExecNetThread-0] SocketException: remote: 192.168.168.122:27017 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.168.122:27017]
2019-07-03T09:54:29.169+0800 I NETWORK [ReplExecNetThread-0] DBClientCursor::init call() failed
2019-07-03T09:54:29.169+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.168.122:27017; Location10276 DBClientBase::findN: transport error: 192.168.168.122:27017 ns: admin.$cmd query: { replSetHeartbeat: "ArchiverReplica", pv: 1, v: 1, from: "x.x.x.x:27017", fromId: 1, checkEmpty: false }

2019-07-03T09:54:29.170+0800 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.168.122:27017 after 1 milliseconds, giving up.
2019-07-03T09:54:29.170+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.168.122:27017; Location18915 Failed attempt to connect to 192.168.168.122:27017; couldn't connect to server 192.168.168.122:27017 (192.168.168.122), connection attempt failed
...
2019-07-03T10:07:41.452+0800 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.168.122:27017 after 4995 milliseconds, giving up.
2019-07-03T10:07:41.452+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.168.122:27017; Location18915 Failed attempt to connect to 192.168.168.122:27017; couldn't connect to server 192.168.168.122:27017 (192.168.168.122), connection attempt failed
2019-07-03T10:07:43.602+0800 I REPL [ReplicationExecutor] Member 192.168.168.122:27017 is now in state PRIMARY
...
2019-07-03T10:08:03.845+0800 I NETWORK [rsSync] Socket recv() errno:104 Connection reset by peer 192.168.168.122:27017
2019-07-03T10:08:03.845+0800 I NETWORK [rsSync] SocketException: remote: 192.168.168.122:27017 error: 9001 socket exception [RECV_ERROR] server [192.168.168.122:27017]
2019-07-03T10:08:03.853+0800 I NETWORK [rsSync] trying reconnect to 192.168.168.122:27017 (192.168.168.122) failed
2019-07-03T10:08:03.928+0800 I NETWORK [rsSync] reconnect 192.168.168.122:27017 (192.168.168.122) ok
2019-07-03T10:08:03.939+0800 E REPL [rsSync] 16465 recv failed while exhausting cursor
2019-07-03T10:08:03.939+0800 E REPL [rsSync] initial sync attempt failed, 9 attempts remaining
2019-07-03T10:08:08.939+0800 I REPL [rsSync] initial sync pending
2019-07-03T10:08:08.958+0800 I REPL [ReplicationExecutor] syncing from: 192.168.168.122:27017
2019-07-03T10:08:09.204+0800 I REPL [rsSync] initial sync drop all databases
2019-07-03T10:08:09.205+0800 I STORAGE [rsSync] dropAllDatabasesExceptLocal 3
2019-07-03T10:08:09.221+0800 I JOURNAL [rsSync] journalCleanup...
2019-07-03T10:08:09.221+0800 I JOURNAL [rsSync] removeJournalFiles
2019-07-03T10:08:09.895+0800 I JOURNAL [rsSync] journalCleanup...
2019-07-03T10:08:09.895+0800 I JOURNAL [rsSync] removeJournalFiles
...
resyn from the begining .......

Best Answer

To upgrade to MongoDB 3.0 all members of a replica set must be running the previous major release (MongoDB 2.6). See: Upgrade a replica set to 3.0.

A replica set configuration mixing 2.4 and 3.0 nodes is not supported or tested and will likely lead to unexpected errors. Since you have 1.7TB of data, I would definitely follow the recommended upgrade instructions in the MongoDB documentation and upgrade all members one major release series at a time (2.4 => 2.6, then 2.6 => 3.0).

Since it sounds like you are starting from a single node, it would be more straightforward to:

Ideally you should upgrade to a supported version of MongoDB (currently 3.4 or newer, although 3.4 will reach EOL in Jan 2020). Changes in successive releases have improved initial sync and 3.4 would be a much better starting point. MongoDB 3.4 also includes optional network compression for communication between mongod instances (which is enabled by default in 3.6+).