Bash – Understanding i/o redirection in the context of _process substitution_

bashio-redirectionpipeprocess-substitutiontee

Running GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu).

I don't really understand process substitution (ProcSub) from the perspective of a user interested in lifting the hood on i/o processing and related speed issues. I use ProcSub to script, so I have some knowledge of File Descriptors 0,1,2, but that's pretty much it. I have read a few pretty good posts, e.g. [1], and misc. wikis, e.g. [2],[3], the latter stating: "Process substitution feeds the output (FD 1 and/or 2) of a process (or processes) into the stdin (FD 0) of another process". By that simplest of definitions and for just one process, it seems operationally no different from a simple unnamed pipe.

To look into that I started with tee, in itself interesting from the point of view of i/o.
tee permits feeding "stdin to stdout and also to any files given as argument". So :

$ for i in 1 2 3; do (( j=i+10 )); printf "%d\n" $j > file_$i; done
# so each file in file_{1,2,3} contains the numeral in its name + 10.
$ cat file_{1,2,3} | tee file_4
11
12
13
$ cat file_4
11
12
13

Obviously, I am not interested in seeing data filling my screen ala Matrix, so when:

1) I add a pipe and redirection of shasum's output …

$ cat file_{1,2,3} | tee file_4 | shasum -a 256 > file_4.sha256
$ 

the one liner above exits quietly, file_4 is as before (above) and file_4.sha256 contains the computed SHA256 sum.

The above is just an example to illustrate my question, trying to understand intermediate i/o's. My layman's conclusion is that tee saves the output of the cat cmd in file_4 and its copy normally sent to stdout is actually not sent to stdout but piped to shasum.
Q: Is this even remotely right ?

2) I try the same with ProcSub:

$ cat file_{1,2,3} | tee file_4 >(shasum -a 256 > file_4.sha256)
11
12
13
$ 

-> No stdout redirection of whatever being sent to FD 1 by tee ?

Q: I am not clear on what ProcSub does or does not do to i/o (obviously it does not affect i/o in this case) and could use an explanation of its mechanism.

3) I try with ProcSub and redirection of final stdout to file_4:

$ cat file_{1,2,3} | tee >(shasum -a 256 > file_4.sha256) > file_4
$ 

Again this time the one-liner exists quietly.

Q: So the general question is: how are i/o processed for the 3 cases above (or at least for the second and third) ? There are obvious and visible differences in i/o terms (just looking at final stdout), but sometimes different i/o processes can lead to identical end-results on the display. Tx.

Best Answer

The idiom >(...) just means (in layman terms): "the name of a file".

And it works as a "name of a file" (sort of, all will be clear in an instant):

$ echo <(date)
/proc/self/fd/11

Or some other number/name on your OS. But echo does print a name, exactly as if you do:

$ echo ProcSubs11
ProcSubs11

And if a file with the label ProcSubs11 exists, you could also do:

$ cat ProcSubs11
contents_of_file_ProcSubs11

Which you could do exactly the same with:

$ cat <(date)
Fri Jan 15 21:25:18 UTC 2016

The difference is that the actual name of the "Process Substitution" is "not visible" and that the details are a lot longer than reading a simple file, as described quite well in all the painful detail in the link to How process substitution is implemented in bash?.


Having said the above, lets review your items.

Q 1

...seems operationally no different from a simple unnamed pipe...

Well, "Process Substitution" is exactly based in an unnamed pipe as your given first link states:

  1. The bash process creates an unnamed pipe for communication between the two processes created later.

The difference is that all the ~6 steps explained in the link are simplified to one idiom >(...) for writing to and <(...) for reading from.

And, it could be argued that the connection (pipe) has a name, as a file has. Just that that name is hidden from the user (the /proc/self/fd/11 shown at the start).

Example 1

1) I add a pipe and redirection of shasum's output ...

$ cat file_{1,2,3} | tee file_4 | shasum -a 256 > file_4.sha256

There is no "Process Substitution" there, but it worth noting (for later) that tee sends (writes to) what it receive in its stdin to a file file_4 and also sends the same stdin content to stdout. Which happens to be connected to a pipe (in this case) that writes to shasum.

So, in short, in layman terms, tee copy stdin to both file_4 and shasum.

Example 2

2) I try the same with ProcSub:

$ cat file_{1,2,3} | tee file_4 >(shasum -a 256 > file_4.sha256)

Re-using the description above (in layman terms) to describe this example:

Tee copy stdin to three elements: file_4, shasum and stdout.

Why?. Remember that >(...) is the name of a file, lets put that in the line:

$ cat file_{1,2,3} | tee file_4 /proc/self/fd/11

tee is serving the input to two files file_4 and shasum (via "Process Substitution") and the stdout of tee is still connected to its default place: the console. That is why you see the numbers in the console.

To make this example exactly equal to 1), we could do:

$ cat file_{1,2,3} | tee file_4 > /proc/self/fd/11  ### note the added `>`

Which becomes (yes, the space between > and >( must be used.

$ cat file_{1,2,3} | tee file_4 > >(shasum -a 256 > file_4.sha256)

That is redirecting tee's stdout to the "Process Substitution".

Q 3

Q: So the general question is: how are i/o processed for the 3 cases above

I believe I just did explain the 3 cases, if not clear, please comment.

Q 4 (in comments, Please edit and add the question)

why the <(...) construct won't work in the third case.

Because (in layman terms) you can not insert a male prong into a male socket.

The <(...) idiom reads from what is inside the "Process substitution" and therefore provides an "output" and should be inserted in the stdin of the outside command. The outside command tee is trying to connect stdout (like) elements. So, that pair could not match.

An important note: The command cat hides some details when applied to "Process Substitution", as both this command will give the same output:

$ cat   <(date)
$ cat < <(date)

All is correct, but drawing conclusions from a misleading equality is wrong.

Related Question