I've been trying to add a new node to a replica set without success due to these errors:
2019-08-12T06:48:15.031+0200 I ASIO [NetworkInterfaceASIO-RS-0]
Ending connection to host mongo03:27017 due to bad connection status;
1 connections to that host remain open 2019-08-12T06:48:15.031+0200 I
REPL [replication-224] Restarting oplog query due to error:
NetworkInterfaceExceededTimeLimit: error in fetcher batch callback:
Operation timed out. Last fetched optime (with hash): { ts:
Timestamp(1565585283, 1317), t: 9 }[1741433982986566690]. Restarts
remaining: 10 2019-08-12T06:48:15.032+0200 I REPL
[replication-224] Scheduled new oplog query Fetcher source:
mongo03:27017 database: local query: { find: "oplog.rs", filter: { ts:
{ $gte: Timestamp(1565585283, 1317) } }, tailable: true, oplogReplay:
true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 9,
readConcern: { afterOpTime: { ts: Timestamp(1565585283, 1317), t: 9 }
} } query metadata: { $replData: 1, $oplogQueryData: 1,
$readPreference: { mode: "secondaryPreferred" } } active: 1
findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting
down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler
request: RemoteCommand 2262327 — target:mongo03:27017 db:local cmd:{
find: "oplog.rs", filter: { ts: { $gte: Timestamp(1565585283, 1317) }
}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS:
2000, batchSize: 13981010, term: 9, readConcern: { afterOpTime: { ts:
Timestamp(1565585283, 1317), t: 9 } } } active: 1
callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1
retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2019-08-12T06:48:17.637+0200 I ASIO [NetworkInterfaceASIO-RS-0]
Connecting to mongo03:27017 2019-08-12T06:48:17.645+0200 I ASIO
[NetworkInterfaceASIO-RS-0] Successfully connected to mongo03:27017,
took 8ms (2 connections now open to mongo03:27017)
2019-08-12T06:49:36.961+0200 I – [repl writer worker 13]
DB.CollectionName collection clone progress: 221390504/643329790
34% (documents copied) 2019-08-12T06:51:52.859+0200 I ASIO
[NetworkInterfaceASIO-RS-0] Ending connection to host mongo03:27017
due to bad connection status; 1 connections to that host remain open
2019-08-12T06:51:52.859+0200 I REPL [replication-229] Restarting
oplog query due to error: NetworkInterfaceExceededTimeLimit: error in
fetcher batch callback: Operation timed out. Last fetched optime (with
hash): { ts: Timestamp(1565585502, 204), t: 9 }[3285716995067894969].
Restarts remaining: 10 2019-08-12T06:51:52.859+0200 I ASIO
[NetworkInterfaceASIO-RS-0] Connecting to mongo03:27017
2019-08-12T06:51:52.859+0200 I REPL [replication-229] Scheduled
new oplog query Fetcher source: mongo03:27017 database: local query: {
find: "oplog.rs", filter: { ts: { $gte: Timestamp(1565585502, 204) }
}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS:
2000, batchSize: 13981010, term: 9, readConcern: { afterOpTime: { ts:
Timestamp(1565585502, 204), t: 9 } } } query metadata: { $replData: 1,
$oplogQueryData: 1, $readPreference: { mode: "secondaryPreferred" } }
active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms
shutting down?: 0 first: 1 firstCommandScheduler:
RemoteCommandRetryScheduler request: RemoteCommand 2272962 —
target:mongo03:27017 db:local cmd:{ find: "oplog.rs", filter: { ts: {
$gte: Timestamp(1565585502, 204) } }, tailable: true, oplogReplay:
true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 9,
readConcern: { afterOpTime: { ts: Timestamp(1565585502, 204), t: 9 } }
} active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0
attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis:
-1ms
I'm a little bit lost regarding how to debug it.
It's a 250GB database, the synchronization is being done through a private network 100Mb. There is an iptables in all servers but the traffic from the private network is allowed:
-A INPUT -i eth1 -p tcp -m tcp --dport 27017 -m state --state NEW,ESTABLISHED -j ACCEPT
At some point the replication process fails:
2019-08-12T02:22:37.582+0200 I REPL [replication-143] Restarting
oplog query due to error: ExceededTimeLimit: error in fetcher batch
callback: Executor error during find command :: caused by :: errmsg:
"oper ation exceeded time limit". Last fetched optime (with hash): {
ts: Timestamp(1565569328, 710), t: 9 }[7908135344534364729]. Restarts
remaining: 1 2019-08-12T02:22:37.582+0200 I REPL [replication-143]
Scheduled new oplog query Fetcher source: mongo03:27017 database:
local query: { find: "oplog.rs", filter: { ts: { $gte:
Timestamp(1565569328, 710) } }, tailable: true, oplogReplay: true,
awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 9,
readConcern: { afterOpTime: { ts: Timestamp(1565569328, 710), t: 9 } }
} query metadata: { $replData: 1, $oplogQ ueryData: 1,
$readPreference: { mode: "secondaryPreferred" } } active: 1
findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting
down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler
re quest: RemoteCommand 1381614 — target:mongo03:27017 db:local cmd:{
find: "oplog.rs", filter: { ts: { $gte: Timestamp(1565569328, 710) }
}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS:
2000, ba tchSize: 13981010, term: 9, readConcern: { afterOpTime: { ts:
Timestamp(1565569328, 710), t: 9 } } } active: 1 callbackHandle.valid:
1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl
maxAttem pts: 1 maxTimeMillis: -1ms 2019-08-12T02:22:39.612+0200 I
REPL [replication-142] Error returned from oplog query (no more
query restarts left): ExceededTimeLimit: error in fetcher batch
callback: Executor error during find command :: caused by :: errmsg:
"operation exceeded time limit" 2019-08-12T02:22:39.615+0200 I REPL
[replication-142] Finished fetching oplog during initial sync:
ExceededTimeLimit: error in fetcher batch callback: Executor error
during find command :: caused by :: errms g: "operation exceeded time
limit". Last fetched optime and hash: { ts: Timestamp(1565569328,
710), t: 9 }[7908135344534364729] 2019-08-12T02:22:43.034+0200 I REPL
[replication-142] CollectionCloner ns:DB.CollectionName finished
cloning with status: IllegalOperation: AsyncResultsMerger killed
2019-08-12T02:22:44.551+0200 W REPL [replication-142] collection
clone for 'DB.CollectionName' failed due to IllegalOperation: While
cloning collection 'DB.CollectionName' there was an error
'AsyncResultsMerger killed' 2019-08-12T02:22:44.551+0200 I REPL
[replication-142] CollectionCloner::start called, on ns:DB.CollectionName2
2019-08-12T02:22:44.551+0200 W REPL [replication-142] database
'dbName' (9 of 9) clone failed due to ShutdownInProgress: collection
cloner completed 2019-08-12T02:22:44.551+0200 I REPL
[replication-142] Finished cloning data: ShutdownInProgress:
collection cloner completed. Beginning oplog replay.
2019-08-12T02:22:44.551+0200 I REPL [replication-142] Initial sync
attempt finishing up.
And it has to start over again.
Don't know if I could configure a bigger time out to avoid this disconnections.
Thanks in advance.
Best Answer
Thank you!
-Anban