I'm writing a quick tool to inspect the contents of a node.js node_modules
folder or python virtualenv
for native dependencies. As a quick first approximation to this I wrote the following command.
find . | xargs file | awk '/C source/ {print $1} /ELF/ {print $1}'
I'm okay with false positives but not false negatives (e.g. files literally containing the string ELF
or C source
can be marked suspicious.), but this script also potentially breaks on long file names (because xargs
will split them) and file names containing spaces (because awk will split on whitespace) and file names containing newlines (because find uses newlines to separate paths).
Is there a way to filter the paths generated by find
by seeing if the output of file {}
(possibly with some additional options to remove the path entirely from the output of file
) matches a particular regular expression?
Best Answer
The key factor in reaching
find
enlightenment ;) is:There is an alternate approach to this question that it's worth knowing about (as also described in Unix Power Tools, in the section "Using -exec to Create Custom Tests"):
It's worth knowing about this filtering method since it can be used for many more things than simply printing the name of the file; just change the
-print
operator to any other operator you like (including another-exec
operator) and do what you like with it.There is a performance drawback to this command (which is also present in the other answer), which is that since we are using
\;
and not+
, we are spawning a shell for every single file. Using+
to pass multiple files at once to thesh
command and processing them with afor
loop gives a noticeable performance advantage:You can see the comparison for yourself by running both of the following commands and comparing the output of
time
:The real point, though, is:
Never run a shell
for
loop on a list of files that is output fromfind
. Instead, either run the action you need to do on each file directly withinfind
by using the-exec
operator, or embed a shellfor
loop within afind
command and do it that way.Some additional reasons: