MongoDB Replication initial sync fails with OplogStartMissing

mongodbreplication

I'm currently testing replication in our development environment.

Our current (primary) mongo instance has around 70-80 GB of data. I created a new instance and added it to the replica set.

It takes around 4 hours to sync the collections at which point it starts playing back the oplog to catch up with writes on the primary. But doing rs.printSlaveReplicationInfo() shows that the secondary keeps lagging more and more behind the master, and eventually fails with the following error:

OplogStartMissing: error fetching oplog during initial sync :: caused by :: Our last optime fetched: { ts: Timestamp(1581379449, 5879), t: 4 }. source's GTE: { ts: Timestamp(1581379449, 20317), t: 4 }

At which point the initial sync starts over again. Both the master and slave are using version 4.2

Best Answer

How big is your oplog?

Could it be that the oplog is filling up and overwriting the operations since the sync started before the initial sync has finished? If this is the case, the new secondary will never catch up. I had this error once before, but can’t recall the exact error now.

Check the oplog start / end points by using

rs.printReplicationInfo()

If this is the case, either increase the oplog size or pause the application (if possible).

I’ll dig through an old ticket and check the error I had with a sync failing.