MongoDB 4.2 – Oplog Incremental Backup Failure

mongodbmongodb-4.2

I have scheduled oplog incremental dump using mongodump but it fails very often on majority of the servers. I have a shard cluster with around 200GB of oplog size defined and around 30-40 hours of oplog window.

I am thinking it is because of message "WiredTiger record store oplog truncation" just before the dump starts which is common during every failure. Other times, the dump runs perfectly fine. I have analysed the oplog window and size and it is fine. The oplog start data was way older than the timestamp I am using during oplog dump.

Below are some of the logs :

From mongod.log and my backup script :


2020-07-19T16:28:10.472+0000 I STORAGE  [WT RecordStoreThread: local.oplog.rs] WiredTiger record store oplog truncation finished in: 420ms

2020-07-19T16:28:10.502+0000 E QUERY    [conn74791] Plan executor error during find command: DEAD, stats: { stage: "COLLSCAN", filter: { $and: [ { ts: { $lte: Timestamp(1595176086, 2244) } }, { ts: { $gt: Timestamp(1595173520, 25) } } ] }, nReturned: 0, executionTimeMillisEstimate: 2970, works: 356598, advanced: 0, needTime: 356597, needYield: 0, saveState: 2787, restoreState: 2787, isEOF: 0, invalidates: 0, direction: "forward", docsExamined: 356596 }```

oplog-prd-mon-XYZ-shard-hd03.c.XYZ-dr.internal-20200719.log-2020-07-19T16:28:06.794+0000 reading password from standard input
oplog-prd-mon-XYZ-shard-hd03.c.XYZ-dr.internal-20200719.log-Enter password:
oplog-prd-mon-XYZ-shard-hd03.c.XYZ-dr.internal-20200719.log-2020-07-19T16:28:07.153+0000 writing local.oplog.rs to stdout
oplog-prd-mon-XYZ-shard-hd03.c.XYZ-dr.internal-20200719.log-2020-07-19T16:28:09.795+0000 local.oplog.rs  0
oplog-prd-mon-XYZ-shard-hd03.c.XYZ-dr.internal-20200719.log-2020-07-19T16:28:10.517+0000 local.oplog.rs  0
oplog-prd-mon-XYZ-shard-hd03.c.XYZ-dr.internal-20200719.log:2020-07-19T16:28:10.517+0000 Failed: error writing data for collection `local.oplog.rs` to disk: error reading collection: Executor error during find command :: caused by :: errmsg: "CollectionScan died due to position in capped collection being deleted. Last seen record id: RecordId(6850019249918838732)"

Best Answer

You are right, the oplog truncation is killing your cursor.

There are no indexes on the oplog, so dump has to scan from the very beginning to find the documents that match the query.

In that case of this mongodump cursor, it had examined 356596 documents, but still hadn't found any that matched the query.

The failure message reports the last seen record was "RecordId(6850019249918838732)". That number is a 64-bit value that corresponds to the ts.

You can get the first 32 bits and the corresponding timestamp from bash:

% echo $((6850019249918838732 >> 32))
1594894391

That date is Thu Jul 16 10:13:11 UTC 2020 still over a month prior to the first date of interest.

When those documents were trimmed from the capped collection, the cursor no longer pointed to a valid position, so it was not possible to continue.

When that happens, you'll need to restart the mongodump so it can start scanning from the new beginning of the oplog.

Related Solutions

Mongodb – How we can take backup oplog on every hour and apply on top on full backup for mongodb

The oplog is idempotent, you can run through the operations in it as many times as you want and you won't get duplicates or issues unless you run the operations on a set of data files in an odd state.

However, it should be noted that as long as you have the journal as part of the LVM snapshot, re-running the oplog is not necessary for a consistent backup.

With that said, if you do have a copy of the oplog, you can "replay" it in two ways:

Start up a new MongoDB instance, use it to access the oplog backup, and then create a custom script to read each operation and apply it in order (has the benefit of allowing you to filter and being rather quick)
Use mongodump/mongorestore to replay the oplog (see below)

To use mongodump in this way, you first have to dump it out into BSON format so that it can be resored:

mongodump --dbpath /path/to/folder/with/oplog/files -d local -c oplog.rs -o oplogDump

Next you move the bson file out of the folder (this is basically for convenience):

mkdir oplogRestore
mv oplogDump/local/oplog.rs.bson oplogRestore/oplog.bson

Now you can use mongorestore to replay the oplog and apply it to a given running instance:

mongorestore --host host:port --oplogReplay oplogRestore

You can also use the oplogLimit option if you only wish to replay up to a certain point. See Asya's excellent answer here for more on that.

Please note that none of this will be particularly quick, nor is it strictly necessary (as mentioned above).

MongoDB – Understanding Instances and Oplog

I would like to create a replica-set with oplog

As per MongoDB Documentation sources from MongoDB IN ACTION The Replica sets rely on two basic mechanisms: an oplog and a heartbeat. The oplog enables the replication of data, and the heartbeat monitors health and triggers failover.

At the heart of MongoDB’s replication stands the oplog. The oplog is a capped collection that lives in a database called local on every replicating node and records all changes to the data. Every time a client writes to the primary, an entry with enough information to reproduce the write is automatically added to the primary’s oplog.Once the write is replicated to a given secondary, that secondary’s oplog also stores a record of the write. Each oplog entry is identified with a BSON timestamp, and all secondaries use the timestamp to keep track of the latest entry they’ve applied.

Applications sometimes query secondary nodes for read scaling. If that’s happening, this kind of failure will cause read failures, So it’s important to design your application with failover in mind.

let’s look more closely at a real oplog and at the operations recorded in it. First connect with the shell to the primary node and switch to the local database.

amj:PRIMARY> use local
switched to db local

The local database stores all the replica set metadata and the oplog. Naturally, this database isn’t replicated itself. Thus it lives up to its name; data in the local database is supposed to be unique to the local node and therefore shouldn’t be replicated.If you examine the local database, you’ll see a collection called oplog.rs, which is where every replica set stores its oplog. You’ll also see a few system collections. Here’s the complete output:

amj:PRIMARY> show collections
me
oplog.rs
replset.minvalid
slaves
startup_log
system.indexes
system.replset

replset.minvalid contains information for the initial sync of a given replica set member, and system.replset stores the replica set config document. Not all of your mongod servers will have the replset.minvalid collection. me and slaves are used to implement write concerns, and system.indexes is the standard index spec container.

right now only one mongod instance is running on the server, serving all applications; should I run different mongod instances (one for every application)

I don't think so it's good idea to run different one mongod instance for one application.

For Your Further Ref Here and Here

Best Answer

Related Solutions

Mongodb – How we can take backup oplog on every hour and apply on top on full backup for mongodb

MongoDB – Understanding Instances and Oplog

Related Question