MongoDB unrecoverable replication error

mongodbreplication

One of member of my MongoDB replica set decided it would not restart, with the following error (reformatted for readability):

Starting rollback due to OplogStartMissing: 
our last op time fetched: (term: 30, timestamp: Jul 28 07:45:11:6) 
source's GTE:             (term: 31, timestamp: Jul 28 07:45:11:7)

Fatal assertion 18750 UnrecoverableRollbackError
                          (term: 31, timestamp: Jul 28 07:45:12:2) > our last optime: 
                          (term: 30, timestamp: Jul 28 07:45:11:6)

Let's call the instance where this happens M1, and the source its trying to sync M2. M1 used to be primary, then the primary switched to M2, and M1 restarted.

The naive interpretation of these log messages is that the first operation from M2's oplog is exactly the next operation after what we have applied in M1. So, we should just happily apply operations from M2, but MongoDB tries to rollback some operations, finds an operation in future relative to both what we've applied and what's next on M2, and dies.

I have two questions: first, why is MongoDB trying rollback in the first place, and second, where is operation with timestamp of Jul 28 07:45:12:2 is coming from?

Best Answer

let me share you my understanding on this.

question 1:****why is MongoDB trying rollback in the first place When there is a node switch, documents which are not yet written on secondaries needs to be rollback. As M1 was primary and at the time of switching some documents were not sync they must be in your oplog but as your server was restarted in the mean time M2 become primary and rest of replica members starts syncing with new primary. Now your M1 came up and found that his last operation was not written on other members so it needs to be rollback.

Question 2:****where is operation with timestamp of Jul 28 07:45:12:2 is coming from check this timestamp in oplog.rs you will find your answer as this would be the same document(rollback)