Mongodb – mongorestore – documents fail validation

mongodbmongodb-4.0mongodb-4.2

The similar question already have been asked: How to validate a mongorestore

But how to actually validate the documents in mongodumb and mongorestore?
And also – can I see the documents that fail validation and why do they fail it?

The commands I am using:

mongodump -d my_database -o ~/dump --gzip
mongorestore --drop --gzip dump

The mongo is of 4.x.x version (4.0.10 and 4.2.0).

Best Answer

If only needing to validate, the best combination would be to add --dryRun --objcheck --verbose or --dryRun --objcheck -vvvvv to your mongorestore command.

When running the command for real, drop the --dryRun.

Related Solutions

MongoDB – Real Disk Usage Lower than dbStats.storageFile

Whatever you do, do not shut down that mongod process until you back up your data (see below). There are missing files in that database directory, and I suspect they have been manually deleted at the OS level. The data files should not have any gaps in them, ever. In other words you should have files starting at myBase.0 all the way up to myBase.37, there should be no gaps in the numbers.

To explain, if you delete the files using rm or similar at the OS level it will succeed, the OS allows it, but because the mongod process that is running has an open file handle to the files they will not actually be deleted by the operating system until you stop the process.

Here's an example of what the lsof command shows for a normal data file called foo.0:

mongod     5786             adam  mem       REG                9,0   67108864
805306654 /data/db/test0/foo.0

And here is what it looks like when you have manually deleted the file:

mongod     5786             adam   24u      REG                9,0   67108864  
805306654 /data/db/test0/foo.0 (deleted)

From within MongoDB that file still exists and is accessible, I can query, run db.stats() etc. successfully, but if that mongod process is restarted the file will be removed and the data is at that point essentially gone (barring efforts to undelete at the filesystem level).

So, what should you do? Well, the first thing is to make sure you have a copy of the data before shutting down that process and losing it. To do that you have a couple of options:

If this is a node in a replica set (even single), add a new secondary set member and let it sync - that will still succeed and then you will have a fully populated version of the data ready to take over on that secondary. (Note: If this is not a replica set you can't turn it into one without a process restart, and that would delete the data - my recommendation is to always run as a replica set, even a single node for anything in production)
Run mongodump to dump the data out somewhere else before it gets deleted. This won't be fast, and you will need plenty of space, but at least it will give you an easily restorable version of your data

A repair on the database might work, but only if you have enough space to accommodate 2x the data plus index size on that disk. It must be a repair command, not a restart with --repair because the restart would cause the files to be deleted.

Finally, you need to figure out what is deleting these files and stop it - is there a cron job or other process that is automatically deleting large files (the data files will usually be 2GB) over a certain age or similar? I've seen things like that before wipe out MongoDB data files with similar results.

Mongodb – How verify if the MongoDb Sharded Replica Set works well

I will try to be brief. Most of the answers are documented.

Q1: There are many ways. Shard a collection using _id:hashed and set initial number of chunks 2 and then start inserting documents (100 is a good number of docs). You will see documents on both shards.

Q2: It uses range partitioning. Each shard holds ranges of the shard key.

Q3: Sharding is an automated process, all you need to do is to determine a shard key.

Q4: Stop the balancer, backup shard rs1, rs2 and the config server, start the balancer.

Q5: On a sharded cluster the access point should be the mongos

Best Answer

Related Solutions

MongoDB – Real Disk Usage Lower than dbStats.storageFile

Mongodb – How verify if the MongoDb Sharded Replica Set works well

Related Question