I have a mongodb cluster with 3 shards and no replication
The problem is one of the shards has a bad disk and because of which some files show input/output error when trying to read
[root@mongodb2 b0]# md5sum collection-700-3655945817504191327.wt
md5sum: collection-700-3655945817504191327.wt: Input/output error
If I try to use wt to salvage the file that too fails
I know exactly which documents are corrupt , but I am helpless because I can not delete or update these documents. Any find() or remove() command on these documents causes entire mongo shard to get killed.
How can I recover this data
Best Answer
Ok we "fixed" the issue or atleast solved the problem of Mongo crashing
The wt collection file was corrupt and was not readable. We run ddrescue and got rid of the bad sections in the file
And then run wt to salvage the file
Thats it!!. The bad documents could not be recovered, but never mind I can regenerate that data from my app. Most importantly the mongo server does not crash. Hopefully mongodb will improve the logic in the newer versions. Crashing the db on any single I/O error is not very helpful