When a MongoDB instance gets into a Rollback state, and the rollback data is greater than 300MB of data, you have to manually intervene. It will stay in a rollback state until you take action to save/remove/move that data, the (now secondary) should then be resynced to bring it back in line with the primary. This does not have to be a full resync, but that is the simplest way.
Multiple rollbacks are a symptom rather than the cause of a problem. Rollback only happens when a secondary that was not in sync (either due to lag or an issue with replication) becomes primary and takes writes. So, the problems that cause that to happen in the first place are what need to be taken care of - the rollback itself is something you need to deal with as an admin - there are too many potential pitfalls for MongoDB to reconcile the data automatically.
If you want to simulate this again for testing purposes, I have outlined how to do so here:
http://comerford.cc/2012/05/28/simulating-rollback-on-mongodb/
Eventually, this data will be stored in a collection (in the local DB) rather than dumped to disk, which will present opportunities to deal with it more effectively:
https://jira.mongodb.org/browse/SERVER-4375
At the moment though, once a rollback occurs, as you found, manual intervention is required.
Finally, the manual contains similar information to Kristina's blog now:
https://docs.mongodb.com/manual/core/replica-set-rollbacks
OK, i think i know the answer. The OPLOG collection is a capped collection. It overwrites over time. The profile level was set to 2 for a short period of time logging all operations. I guess it will take time to overwrite these operations on a capped collection again and as a result increase the op log window.
Would be interested in anyone elses take on this issue.
Thank you.
Best Answer
If the drop command ran slowly, then it will be recorded in the logs (by default >100ms), otherwise the only record of it will be in the
oplog
(assuming you are running a replica set, even a single node replica set) and that is assuming that it did not occur so far in the past that it has "fallen out" of theoplog
(which is a capped collection).NOTE: Before running queries against your oplog, be aware that any such queries will be table scans, and that running such a query will potentially be slow, especially when run on an active replica set with a large
oplog
. If you have a secondary available, you may wish to use that for this type of query rather than add load to your primary.With that said, let's show an example. We will call the database "foo", drop it and show how you would search the
oplog
for evidence of the drop. From themongo
shell your query would look like this:And the result, if found, would look like this:
After that, the only other place to gather such evidence would be the filesystem, since the files are unlinked/deleted when the DB is dropped. There are plenty of answers on how to do that (and the potential problems) - I've used this one successfully in the past.