IO Redirection – Using xargs with stdin/stdout Redirection

io-redirectionxargs

I would like to run:

./a.out < x.dat > x.ans

for each *.dat file in the directory A.

Sure, it could be done by bash/python/whatsoever script, but I like to write sexy one-liner. All I could reach is (still without any stdout):

ls A/*.dat | xargs -I file -a file ./a.out

But -a in xargs doesn't understand replace-str 'file'.

Thank you for help.

Best Answer

First of all, do not use ls output as a file list. Use shell expansion or find. See below for potential consequences of ls+xargs misuse and an example of proper xargs usage.

1. Simple way: for loop

If you want to process just the files under A/, then a simple for loop should be enough:

for file in A/*.dat; do ./a.out < "$file" > "${file%.dat}.ans"; done

2._pre1 Why not `ls | xargs` ?

Here's an example of how bad things may turn if you use ls with xargs for the job. Consider a following scenario:

first, let's create some empty files:

$ touch A/mypreciousfile.dat\ with\ junk\ at\ the\ end.dat
$ touch A/mypreciousfile.dat
$ touch A/mypreciousfile.dat.ans

see the files and that they contain nothing:

$ ls -1 A/
mypreciousfile.dat
mypreciousfile.dat with junk at the end.dat
mypreciousfile.dat.ans

$ cat A/*

run a magic command using xargs:

$ ls A/*.dat | xargs -I file sh -c "echo TRICKED > file.ans"

the result:

$ cat A/mypreciousfile.dat
TRICKED with junk at the end.dat.ans

$ cat A/mypreciousfile.dat.ans
TRICKED

So you've just managed to overwrite both mypreciousfile.dat and mypreciousfile.dat.ans. If there were any content in those files, it'd have been erased.

2. Using `xargs` : the proper way with `find`

If you'd like to insist on using xargs, use -0 (null-terminated names) :

find A/ -name "*.dat" -type f -print0 | xargs -0 -I file sh -c './a.out < "file" > "file.ans"'

Notice two things:

this way you'll create files with .dat.ans ending;
this will break if some file name contains a quote sign (").

Both issues can be solved by different way of shell invocation:

find A/ -name "*.dat" -type f -print0 | xargs -0 -L 1 bash -c './a.out < "$0" > "${0%dat}ans"'

3. All done within `find ... -exec`

 find A/ -name "*.dat" -type f -exec sh -c './a.out < "{}" > "{}.ans"' \;

This, again, produces .dat.ans files and will break if file names contain ". To go about that, use bash and change the way it is invoked:

 find A/ -name "*.dat" -type f -exec bash -c './a.out < "$0" > "${0%dat}ans"' {} \;

Q 1

...seems operationally no different from a simple unnamed pipe...

Well, "Process Substitution" is exactly based in an unnamed pipe as your given first link states:

The bash process creates an unnamed pipe for communication between the two processes created later.

The difference is that all the ~6 steps explained in the link are simplified to one idiom >(...) for writing to and <(...) for reading from.

And, it could be argued that the connection (pipe) has a name, as a file has. Just that that name is hidden from the user (the /proc/self/fd/11 shown at the start).

Example 1

1) I add a pipe and redirection of shasum's output ...
$ cat file_{1,2,3} | tee file_4 | shasum -a 256 > file_4.sha256

There is no "Process Substitution" there, but it worth noting (for later) that tee sends (writes to) what it receive in its stdin to a file file_4 and also sends the same stdin content to stdout. Which happens to be connected to a pipe (in this case) that writes to shasum.

So, in short, in layman terms, tee copy stdin to both file_4 and shasum.

Example 2

2) I try the same with ProcSub:
$ cat file_{1,2,3} | tee file_4 >(shasum -a 256 > file_4.sha256)

Re-using the description above (in layman terms) to describe this example:

Tee copy stdin to three elements: file_4, shasum and stdout.

Why?. Remember that >(...) is the name of a file, lets put that in the line:

$ cat file_{1,2,3} | tee file_4 /proc/self/fd/11

tee is serving the input to two files file_4 and shasum (via "Process Substitution") and the stdout of tee is still connected to its default place: the console. That is why you see the numbers in the console.

To make this example exactly equal to 1), we could do:

$ cat file_{1,2,3} | tee file_4 > /proc/self/fd/11  ### note the added `>`

Which becomes (yes, the space between > and >( must be used.

$ cat file_{1,2,3} | tee file_4 > >(shasum -a 256 > file_4.sha256)

That is redirecting tee's stdout to the "Process Substitution".

Q 3

Q: So the general question is: how are i/o processed for the 3 cases above

I believe I just did explain the 3 cases, if not clear, please comment.

Q 4 (in comments, Please edit and add the question)

why the <(...) construct won't work in the third case.

Because (in layman terms) you can not insert a male prong into a male socket.

The <(...) idiom reads from what is inside the "Process substitution" and therefore provides an "output" and should be inserted in the stdin of the outside command. The outside command tee is trying to connect stdout (like) elements. So, that pair could not match.

An important note: The command cat hides some details when applied to "Process Substitution", as both this command will give the same output:

$ cat   <(date)
$ cat < <(date)

All is correct, but drawing conclusions from a misleading equality is wrong.

Best Answer

1. Simple way: for loop

2.pre1 Why not ls | xargs ?

2. Using xargs : the proper way with find

3. All done within find ... -exec

Related Solutions

Command that prints file contents given filename on stdin

Bash – Understanding i/o redirection in the context of _process substitution_

Q 1