MongoDB Replica Set InitialSyncOplogSourceMissing error

awsmongodb-3.6replication

I am setting up a 3 nodes MongoDB replica set with one node in AWS using Mongo 3.6.12. There are about 400GB data need to sync to secondary nodes. Initial sync starts fine. However, after data sync process running for one day, my AWS node get an InitialSyncOplogSourceMissing error:

Initial Sync Attempt Statistics: { failedInitialSyncAttempts: 9, maxFailedInitialSyncAttempts: 10, initialSyncStart: new Date(1565682109467), initialSyncAttempts: [ { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" } ] }
2019-08-13T08:43:28.775+0100 E REPL     [replication-1] Initial sync attempt failed -- attempts left: 0 cause: InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.
2019-08-13T08:43:28.775+0100 F REPL     [replication-1] The maximum number of retries have been exhausted for initial sync.
2019-08-13T08:43:28.775+0100 D EXECUTOR [replication-1] Executing a task on behalf of pool replication
2019-08-13T08:43:28.775+0100 D EXECUTOR [replication-0] Not reaping because the earliest retirement date is 2019-08-13T08:43:58.775+0100
2019-08-13T08:43:28.775+0100 D STORAGE  [replication-1] dropCollection: local.temp_oplog_buffer
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] dropCollection: local.temp_oplog_buffer - dropAllIndexes start
2019-08-13T08:43:28.776+0100 D INDEX    [replication-1]      dropAllIndexes dropping: { v: 2, key: { _id: 1 }, name: "_id_", ns: "local.temp_oplog_buffer" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] local.temp_oplog_buffer: clearing plan cache - collection info cache reset
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] WT begin_transaction for snapshot id 273
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [ { spec: { v: 2, key: { _id: 1 }, name: "_id_", ns: "local.temp_oplog_buffer" }, ready: true, multikey: false, multikeyPaths: { _id: BinData(0, 00) }, head: 0, prefix: -1 } ], prefix: -1 }, idxIdent: { _id_: "local/index-21-7814325343802347057" }, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [ { spec: { v: 2, key: { _id: 1 }, name: "_id_", ns: "local.temp_oplog_buffer" }, ready: true, multikey: false, multikeyPaths: { _id: BinData(0, 00) }, head: 0, prefix: -1 } ], prefix: -1 }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] recording new metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.776+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] dropCollection: local.temp_oplog_buffer - dropAllIndexes done
2019-08-13T08:43:28.777+0100 I STORAGE  [replication-1] Finishing collection drop for local.temp_oplog_buffer (no UUID).
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] looking up metadata for: local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1]  fetched CCE metadata: { md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }, idxIdent: {}, ns: "local.temp_oplog_buffer", ident: "local/collection-20-7814325343802347057" }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] returning metadata: md: { ns: "local.temp_oplog_buffer", options: { temp: true }, indexes: [], prefix: -1 }
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] deleting metadata for local.temp_oplog_buffer @ RecordId(113)
2019-08-13T08:43:28.777+0100 D STORAGE  [replication-1] WT commit_transaction for snapshot id 273
2019-08-13T08:43:28.780+0100 D STORAGE  [replication-1] WT drop of  table:local/index-21-7814325343802347057 res 0
2019-08-13T08:43:28.780+0100 D STORAGE  [replication-1] ~WiredTigerRecordStore for: local.temp_oplog_buffer
2019-08-13T08:43:28.782+0100 D STORAGE  [replication-1] WT drop of  table:local/collection-20-7814325343802347057 res 0
2019-08-13T08:43:28.782+0100 E REPL     [replication-1] Initial sync failed, shutting down now. Restart the server to attempt a new initial sync.
2019-08-13T08:43:28.782+0100 F -        [replication-1] Fatal assertion 40088 InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync. at src\mongo\db\repl\replication_coordinator_impl.cpp 711
2019-08-13T08:43:28.782+0100 F -        [replication-1] 

Restarted mongod instance will lead to same error immediately. If I force the replica set to reconfig, all data sync to this secondary node will be deleted and start another initial sync. What will be the best action for this InitialSyncOplogSourceMissing error?

Best Answer

I managed to get this replicaset set up. It turns out there is a process in our AWS instance restart the mongod instance everyday. Due to network latency, it takes about two days to sync all the data, so this error arise when mongod was forced to shut down during the initial data sync phase. May be something Mongo needs to look into.