MongoDB – WT File Corruption Issue, Cannot Query Documents

fileslinuxmongodb

I have a mongodb cluster with 3 shards and no replication
The problem is one of the shards has a bad disk and because of which some files show input/output error when trying to read

[root@mongodb2 b0]# md5sum collection-700-3655945817504191327.wt
md5sum: collection-700-3655945817504191327.wt: Input/output error

If I try to use wt to salvage the file that too fails

I know exactly which documents are corrupt , but I am helpless because I can not delete or update these documents. Any find() or remove() command on these documents causes entire mongo shard to get killed.

How can I recover this data

Best Answer

Ok we "fixed" the issue or atleast solved the problem of Mongo crashing

The wt collection file was corrupt and was not readable. We run ddrescue and got rid of the bad sections in the file

ddrescue -n -e1 /database/mongocluster/shard2/b0/collection-72--3831537583791950739.wt RecoveredFiles/collection-72--3831537583791950739.wt RecoveredFiles/collection-72--3831537583791950739.wt.log

And then run wt to salvage the file

./wt  -v -h /database/mongocluster/shard2/b0 -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R salvage collection-72--3831537583791950739.wt

Thats it!!. The bad documents could not be recovered, but never mind I can regenerate that data from my app. Most importantly the mongo server does not crash. Hopefully mongodb will improve the logic in the newer versions. Crashing the db on any single I/O error is not very helpful