Xargs into second side of pipe

pipexargs

I'm trying to do the following:

cat file1.txt | xargs -I{} "cat file2.txt | grep {}"

I'm expecting each line from file1 to be the value for grep at the end of the third pipe. It's not working as expected.

Is this because -I{} stops looking for things to replace once it hits the pipe? Is there a way around this?

Best Answer

It's because you need a shell to create a pipe or perform redirection. Note that cat is the command to concatenate, it makes little sense to use it just for one file.

cat file1.txt | xargs -I{} sh -c 'cat file2.txt | grep -e "$1"' sh {}

Do not do:

cat file1.txt | xargs -I{} sh -c 'cat file2.txt | grep -e {}'

as that would amount to a command injection vulnerability. The {} would be expanded in the code argument to sh so interpreted as shell code. For instance, if one the line of file1.txt was $(reboot) that would call reboot.

The -e (or you could also use --) is also important. Without it, you'd have problems with regexps starting with -.

You can simplify the above using redirections instead of cat:

< file1.txt xargs -I{} sh -c '< file2.txt grep -e "$1"' sh {}

Or simply pass the file names as argument to grep instead of using redirections in which case you can even drop the sh:

< file1.txt xargs -I{} grep -e {} file2.txt

You could also tell grep to look for all the regexps at once in a single invocation:

grep -f file1.txt file2.txt

Note however, that in that case, that's just one regexp for each line of file1.txt, there's none of the special quote processing done by xargs.

xargs by default considers its input as a list of blank (with some implementations only space and tab, on others any in the [:blank:] character class of the current locale) or newline separated words for which backslash and single and double quotes can be used to escape the separators (newline can only be escaped by backslash though) or each other.

For instance, on an input like:

 'a "b'\" "bar baz" x\
y

xargs without -I{} would pass a "b", bar baz and x<newline>y to the command.

With -I{}, xargs gets one word per line but still does some extra processing. It ignores leading (but not trailing) blanks. Blanks are no longer considered as separators, but quote processing is still being done.

On the input above xargs -I{} would pass one a "b" foo bar x<newline>y argument to the command. Also note that one many systems, as required by POSIX, that won't work if words are more than 255 characters long. All in all, xargs -I{} is pretty useless.

If you want each line to be passed verbatim as argument to the command you could use GNU xargs -d '\n' extension:

< file1.txt xargs -d '\n' -n 1 grep file2.txt -e

(here relying on another extension of GNU grep that allows passing options after arguments (provided POSIXly correct is not in the environment) or portably:

sed "s/'/'\\\\\\''/g;s/.*/'&'/" file1.txt | xargs -n1 sh -c '
  for line do
    grep -e "$line" file2.txt
  done' sh

If you wanted each word in file1.txt (quotes still recognised) as opposed to each line to be looked for (which would also work around your trailing space issue if you have one word per line anyway), you can use xargs -n1 alone instead of using -I:

< file1.txt xargs -n1 sh -c '
  for word do
    grep -e "$word" file2.txt
  done' sh

To strip leading and trailing blanks (but without the quote processing that xargs does), you could also do:

unset IFS # restore word splitting to its default
while read -r regexp; do
  grep -e "$regexp" file2.txt
done < file1.txt
Related Question