I have a folder (technically in this case, a read-only mounted disk image) containing a ton of data I got by running Data Rescue (a data recovery app) on one of my large server drives. I did several different scan types and dumped all the files into one place. Data Rescue 'reconstructs' deleted files and often doesn't get it quite right. It can miscategorize the type of file it is, and it can mash separate files together.
I am seeking two specific PHP files (and maybe about 5 others if I get lucky). Most of these recovered files don't have names (0002, 0003 etc), so I have to search by content.
I've come up with 6 different strings that should be able to identify these specific files. So I need a way to search the contents of files, not in an Apple "magic search" kinda way, but in an old school "manually read through every file looking for a string-match" kinda way.
grep
sounds like the obvious choice, but it's been nothing but problems. grep can search recursively, and it can decompress gzip, zip and bzip archives, which is all good. But after a few minutes of running, it starts streaming "too many open files" errors. I'm not sure why, it's as if grep doesn't close a file after it opens it to search in it. I've also had issues with grep
just stopping… not quitting, not crashing, not going unresponsive, but not using any more CPU, not reading anything from the disk, just sitting idle when it should be searching. I ALSO had trouble running multiple grep
searches at once. grep
seems to load files line by line, so something like a disk image gets the entire thing loaded into memory before searching. But there is only one file in this whole bundle that is larger than the amount of RAM I have. So as long as I do one grep
at a time, I should be fine.
This is the command I'm using (wrapped in a script that does several commands to different output files, with some status outputting):
zfgrep -l -r -a -J -i -s -U -n "#32cd32" /Volumes/\'Storage\'\ Original\ Recovery > 32cd32.txt
This will run for a while, then it will hang. I'll get some results but not a full search. If I remove the -s
, I get the flood of too many open files
errors. Then, at someone else's suggestion, I use find
to feed files to grep
one at a time, like so:
find /Volumes/\'Storage\'\ Original\ Recovery -exec zfgrep -l -r -a -J -i -s -U -n "#32cd32" {} \; -print > 32cd32.txt
But that command has the exact same problems.
So this leaves me stuck. How can I search every single file on this disk image, including the archives, for some plain text strings. Including binary data files that may have been incorrectly merged with plain text files? This doesn't seem like that tough of a task for a modern multicore computer with a current OS, lots of RAM and a SSD.
I actually would prefer a GUI option, but at this point I'll take any solution that works.
Also I originally started trying to do this using BBEdit, but it was skipping a LOT of filetypes even when you tell it to search all files. Even files that are XML based. I was very surprised by this.
Best Answer
Using
find ... -exec grep -r
effectively traverses the whole directory several times (once as part of thefind
, once as part of eachgrep -r
) which may lead to the errors you see. So you should either get rid of thefind
or the-r
. As you use thegrep
part to identify the files to be collected, it's probably the-r
in your case.