Yes, find ./work -print0 | xargs -0 rm
will execute something like rm ./work/a "work/b c" ...
. You can check with echo
, find ./work -print0 | xargs -0 echo rm
will print the command that will be executed (except white space will be escaped appropriately, though the echo
won't show that).
To get xargs
to put the names in the middle, you need to add -I[string]
, where [string]
is what you want to be replaced with the argument, in this case you'd use -I{}
, e.g. <strings.txt xargs -I{} grep {} directory/*
.
What you actually want to use is grep -F -f strings.txt
:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by
newlines, any of which is to be matched. (-F is specified by
POSIX.)
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
So grep -Ff strings.txt subdirectory/*
will find all occurrences of any string in strings.txt
as a literal, if you drop the -F
option you can use regular expressions in the file. You could actually use grep -F "$(<strings.txt)" directory/*
too. If you want to practice find
, you can use the last two examples in the summary. If you want to do a recursive search instead of just the first level, you have a few options, also in the summary.
Summary:
# grep for each string individually.
<strings.txt xargs -I{} grep {} directory/*
# grep once for everything
grep -Ff strings.txt subdirectory/*
grep -F "$(<strings.txt)" directory/*
# Same, using file
find subdirectory -maxdepth 1 -type f -exec grep -Ff strings.txt {} +
find subdirectory -maxdepth 1 -type f -print0 | xargs -0 grep -Ff strings.txt
# Recursively
grep -rFf strings.txt subdirectory
find subdirectory -type f -exec grep -Ff strings.txt {} +
find subdirectory -type f -print0 | xargs -0 grep -Ff strings.txt
You may want to use the -l
option to get just the name of each matching file if you don't need to see the actual line:
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)
That's pretty much the most common way of finding "N most common things", except you're missing a sort
, and you've got a gratuitious cat
:
tr -c '[:alnum:]' '[\n*]' < test.txt | sort | uniq -c | sort -nr | head -10
If you don't put in a sort
before the uniq -c
you'll probably get a lot of false singleton words. uniq
only does unique runs of lines, not overall uniquness.
EDIT: I forgot a trick, "stop words". If you're looking at English text (sorry, monolingual North American here), words like "of", "and", "the" almost always take the top two or three places. You probably want to eliminate them. The GNU Groff distribution has a file named eign
in it which contains a pretty decent list of stop words. My Arch distro has /usr/share/groff/current/eign
, but I think I've also seen /usr/share/dict/eign
or /usr/dict/eign
in old Unixes.
You can use stop words like this:
tr -c '[:alnum:]' '[\n*]' < test.txt |
fgrep -v -w -f /usr/share/groff/current/eign |
sort | uniq -c | sort -nr | head -10
My guess is that most human languages need similar "stop words" removed from meaningful word frequency counts, but I don't know where to suggest getting other languages stop words lists.
EDIT: fgrep
should use the -w
command, which enables whole-word matching. This avoids false positives on words that merely contain short stop works, like "a" or "i".
Best Answer
Edit: If you have GNU utilities, see Gilles' answer for a method using GNU
grep
's recursion abilities that is much simpler than thefind
approach. If you only want to display filenames, you'll still want to add the-l
option as I describe below.Use
grep -l word
to only print names of files containing a match.If you want to find all files in the file system ending in
.sh
, starting at the root/
, thenfind
is the most appropriate tool.The most portable and efficient recommendation is:
This is about as readable as it gets, and is not hard to parse if you understand the semantics behind each of the components.
find /
: runfind
starting at the file system root,/
-type f
: only match regular files-name '*.sh'
: ... and only match files whose names end in.sh
-exec ... {} +
: run command specified in...
on matched files in groups, where{}
is replaced by the file names in the group. The idea is to run the command on as many files at once as possible within the limits of the system (ARG_MAX
). The efficiency of the{} +
form comes from minimizing the number of times the...
command must be called by maximizing the number of files passed to each invocation of...
.grep -l word {}
: where the{}
is the same{}
repeated from above and is replaced by file names. As previously explained,grep -l
prints the names of files containing a match forword
.2>/dev/null
: hide error messages (technically, redirect standard error to the black hole that is/dev/null
). This is for aesthetic and practical reasons, since runningfind
on/
will likely result in reams of "permission denied" messages you may not care about for files which you do not have permission to read and directories you do not have permission to traverse.There are some problems with the suggestions you received and posted in your question. Both
and
fail on files with whitespace in their name. It's best to avoid putting filenames in command substitution altogether. The first one has the additional problem of potentially running into the ARG_MAX limit. The second one is close to what I suggest, but there is no good reason to use
xargs
here, not to mention that safe and correct usage ofxargs
requires sacrificing portability for some GNU-only options (find -print0 | xargs -0
).