Shell – How to cat the contents of files found using find into a single file

findshell-script

I managed to shoot myself where it hurts (really bad) by reformatting a partition that held valuable data. Of course it was not intentional, but it happened.

However, I managed to use testdisk and photorec to recover most of the data. So now I have all that data distributed over almost 25,000 directories. Most of the files are .txt files, while the rest are image files. There are more than 300 .txt files in each directory.

I can grep or use find to extract certain strings from the .txt files and output them to a file. For example, here's a line that I've used to verify that my data is in the recovered files:

find ./recup*/ -name '*.txt' -print | xargs grep -i "searchPattern"

I can output "searchPattern" to a file, but that just gives me that pattern. Here's what I really would like to accomplish:

Go through all the files and look for a specific string. If that string is found in a file, cat ALL the contents of that file to an output file. If the pattern is found in more than one file, append the contents of subsequent files to that output file. Note that I just don't want to output the pattern I'm searching for, but ALL the contents of the file in which the patterns is found.

I think this is doable, but I just don't know how to grab all the contents of a file after grepping a specific pattern from it.

Best Answer

If I understand your goal correctly, the following will do what you want:

find ./recup*/ -name '*.txt' -exec grep -qi "searchPattern" {} \; -exec cat {} \; > outputfile.txt

This will look for all *.txt files in ./recup*/, test each one for searchPattern, if it matches it'll cat the file. The output of all cated files will be directed into outputfile.txt.

Repeat for each pattern and output file.


If you have a very large number of directories matching ./recup*, you might end up with a argument list too long error. The simple way around this is to do something like this instead:

find ./ -mindepth 2 -path './recup*.txt' -exec grep -qi "searchPattern" {} \; -exec cat {} \; > outputfile.txt

This will match the full path. So ./recup01234/foo/bar.txt will be matched. The -mindepth 2 is so that it won't match ./recup.txt, or ./recup0.txt.