Based on this script:
find . -name "*.txt" | grep 'LINUX/UNIX'
and
find . -name "*.txt" | grep 'LINUX/UNIX' | xargs cp <to a path>
from Here
, I can grep files for a certain string which are then being copied into one directory, if they contain that string, yet they are then being kept as separate files. How can I cat these files into one coherent document?
Example
What I have in mind is the following: I have an archive of quotations spread out in separate files across hundreds of folders, the name of the folders being the respective topic. So "philosophy/ontology/concepts/aletheia/notes.tex" will contain all my notes on the philosophical concept of aletheia etc.
They all follow the some naming convention (name is always: notes.tex), so grepping them is easy. I can search them via grep, but I would like to be able to have a script which not only finds them, but also concatenates all files which contain the respective string into one large file.
Best Answer
To select regular files with names matching
*.txt
, in the current directory or below, that contain a particular string (not that contains matches of a particular regular expression), and to concatenate these files together in the order they were found, you may useor
The
grep
utility is used here with its-q
option. This makes it not output anything, but as soon as the given pattern matches, it terminates with a zero exit status, signalling "success". We use this exit status as a test in both the commands above, to select only those files that contain the stringLINUX/UNIX
.The
-F
option togrep
make it interpret the pattern as a string rather than as a regular expression. This potentially makes the command a bit faster, but also means you don't have to worry about searching for strings like*this*
without having to treat the*
character specially (as it's special in regular expressions).Both commands writes the concatenated file data to a file called
myfile
. If that file already exists, it will be truncated (emptied), otherwise it will be created. I intentionally picked an output filename that would not be found by thefind
command, i.e. one that does not end with.txt
.Note that question currently contains code that seems to filter the output of
find
withgrep
, to then callcp
viaxargs
. This is not the question's user's own code, and it has several issues. One issue is that it does not concatenate the contents of any files, and another is that it applies thegrep
to the pathnames outputted byfind
rather than to the contents of the files. See also Why is looping over find's output bad practice? which is relevant here.To use the format of the code in the question to actually solve the issue in this question, i.e. letting
find
produce a list of pathnames and then, separately, havegrep
select the ones that we're interested, to finallycat
these:This passes a list of pathnames of files whose name ends in
.txt
fromfind
to the firstxargs
as a nul-delimited list. Thexargs
utility invokesgrep
on these, andgrep
outputs the pathnames of the files that contains matches, again as a nul-delimited list. It's-l
that makes it output the pathnames of the matching files, and-Z
that turns this into a nul-delimited list rather than newline-delmitied list.This list is then read by the final
xargs
which invokescat
on each file. The concatenated result is written tomyfile
as before.Note that this is a much more awkward way of solving the issue, with potential for forgetting what format the file list is in between stages of the pipeline, and assuming that whoever runs the code must be using a GNU system, or at least GNU tools (i.e. it's hopelessly non-portable).