Secondary node of mongodb cluster has entered in Recovering state and it's not coming out of it. Below is what I see in log. I know one way to fix this issue is to reinitialize secondary node by deleting data directory and restarting secondary. But I don't want to try that option as I have 2 tb of data and primary is getting write continuously.
2017-06-13T12:02:14.946+0000 I REPL [replication-12569] We are too stale to use mongodb.prod.mcse-reporting-olap.services.dal1.prod.walmart.com:27017 as a sync source. Blacklisting this sync source because our last fetched timestamp: 59351d47:3357 is before their earliest timestamp: 593f8b97:5b11 for 1min until: 2017-06-13T12:03:14.946+0000 2017-06-13T12:02:14.946+0000 I REPL [replication-12569] could not find member to sync from 2017-06-13T12:02:14.948+0000 E REPL [rsBackgroundSync] too stale to catch up — entering maintenance mode 2017-06-13T12:02:14.948+0000 I REPL [rsBackgroundSync] Our newest OpTime : { ts: Timestamp 1496653127000|13143, t: 499 } 2017-06-13T12:02:14.948+0000 I REPL [rsBackgroundSync] Earliest OpTime available is { ts: Timestamp 1497336727000|23313, t: 502 } 2017-06-13T12:02:14.948+0000 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember 2017-06-13T12:02:14.948+0000 I REPL [rsBackgroundSync] going into maintenance mode with 11386 other maintenance mode tasks in progress
Best Answer
Link in the error message exactly explain what happened.
To avoid this in future:
You need to investigate why secondary fall behind so much. Possible more writes then normally expected.
Your oplog size might not be set up correctly. Once your secondary is behind than the first entry in oplog it will never catch up as it has no way to get those transactions.