Bash Process Substitution – Process Substitution Output Is Out of Order

bashprocess-substitution

The

echo one; echo two > >(cat); echo three;

command gives unexpected output.

I read this: How process substitution is implemented in bash? and many other articles about process substitution on the internet, but don't understand why it behaves this way.

Expected output:

one
two
three

Real output:

prompt$ echo one; echo two > >(cat); echo three;
one
three
prompt$ two

Also, this two commands should be equivalent from my point of view, but they don't:

##### first command - the pipe is used.
prompt$ seq 1 5 | cat
1
2
3
4
5
##### second command - the process substitution and redirection are used.
prompt$ seq 1 5 > >(cat)
prompt$ 1
2
3
4
5

Why I think, they should be the same? Because, both connects the seq output to the cat input through the anonymous pipe – Wikipedia, Process substitution.

Question: Why it behaves this way? Where is my error? The comprehensive answer is desired (with explanation of how the bash does it under the hood).

Best Answer

Yes, in bash like in ksh (where the feature comes from), the processes inside the process substitution are not waited for (before running the next command in the script).

for a <(...) one, that's usually fine as in:

cmd1 <(cmd2)

the shell will be waiting for cmd1 and cmd1 will be typically waiting for cmd2 by virtue of it reading until end-of-file on the pipe that is substituted, and that end-of-file typically happens when cmd2 dies. That's the same reason several shells (not bash) don't bother waiting for cmd2 in cmd2 | cmd1.

For cmd1 >(cmd2), however, that's generally not the case, as it's more cmd2 that typically waits for cmd1 there so will generally exit after.

That's fixed in zsh that waits for cmd2 there (but not if you write it as cmd1 > >(cmd2) and cmd1 is not builtin, use {cmd1} > >(cmd2) instead as documented).

ksh doesn't wait by default, but lets you wait for it with the wait builtin (it also makes the pid available in $!, though that doesn't help if you do cmd1 >(cmd2) >(cmd3))

rc (with the cmd1 >{cmd2} syntax), same as ksh except you can get the pids of all the background processes with $apids.

es (also with cmd1 >{cmd2}) waits for cmd2 like in zsh, and also waits for cmd2 in <{cmd2} process redirections.

bash does make the pid of cmd2 (or more exactly of the subshell as it does run cmd2 in a child process of that subshell even though it's the last command there) available in $!, but doesn't let you wait for it.

If you do have to use bash, you can work around the problem by using a command that will wait for both commands with:

{ { cmd1 >(cmd2); } 3>&1 >&4 4>&- | cat; } 4>&1

That makes both cmd1 and cmd2 have their fd 3 open to a pipe. cat will wait for end-of-file at the other end, so will typically only exit when both cmd1 and cmd2 are dead. And the shell will wait for that cat command. You could see that as a net to catch the termination of all background processes (you can use it for other things started in background like with &, coprocs or even commands that background themselves provided they don't close all their file descriptors like daemons typically do).

Note that thanks to that wasted subshell process mentioned above, it works even if cmd2 closes its fd 3 (commands usually don't do that, but some like sudo or ssh do). Future versions of bash may eventually do the optimisation there like in other shells. Then you'd need something like:

{ { cmd1 >(sudo cmd2; exit); } 3>&1 >&4 4>&- | cat; } 4>&1

To make sure there's still an extra shell process with that fd 3 open waiting for that sudo command.

Note that cat won't read anything (since the processes don't write on their fd 3). It's just there for synchronisation. It will do just one read() system call that will return with nothing at the end.

You can actually avoid running cat by using a command substitution to do the pipe synchronisation:

{ unused=$( { cmd1 >(cmd2); } 3>&1 >&4 4>&-); } 4>&1

This time, it's the shell instead of cat that is reading from the pipe whose other end is open on fd 3 of cmd1 and cmd2. We're using a variable assignment so the exit status of cmd1 is available in $?.

Or you could do the process substitution by hand, and then you could even use your system's sh as that would become standard shell syntax:

{ cmd1 /dev/fd/3 3>&1 >&4 4>&- | cmd2 4>&-; } 4>&1

though note as noted earlier that not all sh implementations would wait for cmd1 after cmd2 has finished (though that's better than the other way round). That time, $? contains the exit status of cmd2; though bash and zsh make cmd1's exit status available in ${PIPESTATUS[0]} and $pipestatus[1] respectively (see also the pipefail option in a few shells so $? can report the failure of pipe components other than the last)

Note that yash has similar issues with its process redirection feature. cmd1 >(cmd2) would be written cmd1 /dev/fd/3 3>(cmd2) there. But cmd2 is not waited for and you can't use wait to wait for it either and its pid is not made available in the $! variable either. You'd use the same work arounds as for bash.

Related Solutions

Performance: Pipelines vs Process Substitution

Doing same strace, you can see the differences:

With pipe:

$ strace -c ./pipe.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 57.89    0.103005           5     20000           clone
 40.81    0.072616           2     30000     10000 wait4
  0.58    0.001037           0    120008           rt_sigprocmask
  0.40    0.000711           0     10000           pipe

With proc-sub:

$ strace -c ./procsub.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 85.08    0.045502           5     10000           clone
  3.25    0.001736           0     90329       322 read
  2.12    0.001133           0     20009           open
  2.03    0.001086           0     50001           dup2

With above statistics, you can see pipe create more child processes (clone syscall) and spending many times to wait child process (wait4 syscall) to finish for parent process to continue executing.

Process substitution is not. It can read directly from child processes. Process substitution is performed at the same time with parameter and variable expansion, the command in Process Substitution run in background. From bash manpage:

Process Substitution
       Process  substitution  is supported on systems that support named pipes
       (FIFOs) or the /dev/fd method of naming open files.  It takes the  form
       of  <(list) or >(list).  The process list is run with its input or out‐
       put connected to a FIFO or some file in /dev/fd.  The name of this file
       is  passed  as  an argument to the current command as the result of the
       expansion.  If the >(list) form is used, writing to the file will  pro‐
       vide  input  for list.  If the <(list) form is used, the file passed as
       an argument should be read to obtain the output of list.

       When available, process substitution is performed  simultaneously  with
       parameter  and variable expansion, command substitution, and arithmetic
       expansion.

Update

Doing strace with statistics from child processes:

With pipe:

$ strace -fqc ./pipe.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 70.76    0.215739           7     30000     10000 wait4
 28.04    0.085490           4     20000           clone
  0.78    0.002374           0    220008           rt_sigprocmask
  0.17    0.000516           0    110009     20000 close
  0.15    0.000456           0     10000           pipe

With proc-sub:

$ strace -fqc ./procsub.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.38    0.033977           3     10000           clone
 32.24    0.020913           0     96070      6063 read
  5.24    0.003398           0     20009           open
  2.34    0.001521           0    110003     10001 fcntl
  1.87    0.001210           0    100009           close

Bash – Process substitution with tee and paste

The reason you are seeing the output of the original command is because tee outputs to stdout as well as the files specified. To discard this you can put >/dev/null at the end of the command or redirect this output to one of your process substitutions by adding an extra >, eg:

command | tee >(sed -rn 's/.*foo (bar).*/1/p') > >(awk '{print $3}')

Or simpler just use another pipe:

command | tee >(sed -rn 's/.*foo (bar).*/1/p') | awk '{print $3}'

As for combining the result of the two process substitutions using paste, unless there is some obscure shell trick that I don't know about, there is no way to do this without using a named pipe. Ultimately this is two lines (formatted to more for clarity):

mkfifo /tmp/myfifo
command |
  tee >(sed -rn 's/.*foo (bar).*/1/p' >/tmp/myfifo) |
  awk '{print $3}' |
  paste /tmp/myfifo -

If you are putting this in a script, also consider using the recommendations for creating a temporary named pipe here.

Best Answer

Related Solutions

Performance: Pipelines vs Process Substitution

Bash – Process substitution with tee and paste

Related Question