Performance: Pipelines vs Process Substitution

bashperformancepipeprocess-substitutionshell

I tend to use pipelines in my bash scripts over process substitution in most situations, especially in cases of using multiple sets of commands as it seems more readable to do ... | ... | ... over ... < <(... < <(...)).

I'm wondering though why using process substitution is much faster in some situations than using a pipeline.

To test this, I timed two scripts using 10000 iterations of the same attached commands with one using a pipeline and another using process substitution.

Scripts:

pipeline.bash:

for i in {1..10000}; do
    echo foo bar |
    while read; do
        echo $REPLY >/dev/null
    done
done

proc-sub.bash

for i in {1..10000}; do
    while read; do
        echo $REPLY >/dev/null
    done < <(echo foo bar)
done

Results:

~$ time ./pipeline.bash

real    0m17.678s
user    0m14.666s
sys     0m14.807s

~$ time ./proc-sub.bash

real    0m8.479s
user    0m4.649s
sys     0m6.358s

I know that pipelines create a sub process whereas process substitution creates a named pipe or some file in /dev/fd, but am unclear about how those differences impact performance.

Best Answer

Doing same strace, you can see the differences:

With pipe:

$ strace -c ./pipe.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 57.89    0.103005           5     20000           clone
 40.81    0.072616           2     30000     10000 wait4
  0.58    0.001037           0    120008           rt_sigprocmask
  0.40    0.000711           0     10000           pipe

With proc-sub:

$ strace -c ./procsub.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 85.08    0.045502           5     10000           clone
  3.25    0.001736           0     90329       322 read
  2.12    0.001133           0     20009           open
  2.03    0.001086           0     50001           dup2

With above statistics, you can see pipe create more child processes (clone syscall) and spending many times to wait child process (wait4 syscall) to finish for parent process to continue executing.

Process substitution is not. It can read directly from child processes. Process substitution is performed at the same time with parameter and variable expansion, the command in Process Substitution run in background. From bash manpage:

Process Substitution
       Process  substitution  is supported on systems that support named pipes
       (FIFOs) or the /dev/fd method of naming open files.  It takes the  form
       of  <(list) or >(list).  The process list is run with its input or out‐
       put connected to a FIFO or some file in /dev/fd.  The name of this file
       is  passed  as  an argument to the current command as the result of the
       expansion.  If the >(list) form is used, writing to the file will  pro‐
       vide  input  for list.  If the <(list) form is used, the file passed as
       an argument should be read to obtain the output of list.

       When available, process substitution is performed  simultaneously  with
       parameter  and variable expansion, command substitution, and arithmetic
       expansion.

Update

Doing strace with statistics from child processes:

With pipe:

$ strace -fqc ./pipe.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 70.76    0.215739           7     30000     10000 wait4
 28.04    0.085490           4     20000           clone
  0.78    0.002374           0    220008           rt_sigprocmask
  0.17    0.000516           0    110009     20000 close
  0.15    0.000456           0     10000           pipe

With proc-sub:

$ strace -fqc ./procsub.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.38    0.033977           3     10000           clone
 32.24    0.020913           0     96070      6063 read
  5.24    0.003398           0     20009           open
  2.34    0.001521           0    110003     10001 fcntl
  1.87    0.001210           0    100009           close

Related Solutions

Shell – Process Substitution and Pipe

A good way to grok the difference between them is to do a little experimenting on the command line. In spite of the visual similarity in use of the < character, it does something very different than a redirect or pipe.

Let's use the date command for testing.

$ date | cat
Thu Jul 21 12:39:18 EEST 2011

This is a pointless example but it shows that cat accepted the output of date on STDIN and spit it back out. The same results can be achieved by process substitution:

$ cat <(date)
Thu Jul 21 12:40:53 EEST 2011

However what just happened behind the scenes was different. Instead of being given a STDIN stream, cat was actually passed the name of a file that it needed to go open and read. You can see this step by using echo instead of cat.

$ echo <(date)
/proc/self/fd/11

When cat received the file name, it read the file's content for us. On the other hand, echo just showed us the file's name that it was passed. This difference becomes more obvious if you add more substitutions:

$ cat <(date) <(date) <(date)
Thu Jul 21 12:44:45 EEST 2011
Thu Jul 21 12:44:45 EEST 2011
Thu Jul 21 12:44:45 EEST 2011

$ echo <(date) <(date) <(date)
/proc/self/fd/11 /proc/self/fd/12 /proc/self/fd/13

It is possible to combine process substitution (which generates a file) and input redirection (which connects a file to STDIN):

$ cat < <(date)
Thu Jul 21 12:46:22 EEST 2011

It looks pretty much the same but this time cat was passed STDIN stream instead of a file name. You can see this by trying it with echo:

$ echo < <(date)
<blank>

Since echo doesn't read STDIN and no argument was passed, we get nothing.

Pipes and input redirects shove content onto the STDIN stream. Process substitution runs the commands, saves their output to a special temporary file and then passes that file name in place of the command. Whatever command you are using treats it as a file name. Note that the file created is not a regular file but a named pipe that gets removed automatically once it is no longer needed.

Bash Pipes – Pipes vs Process Substitution

Variables in a pipe never make it out of the pipe alive :)
Process substitution redirects the data to a file descriptor. Behind the scenes, that process is not the same as a | pipe. The following works, because it is all within the same pipe.

unset REPLY
cat test.txt | { 
  while read ;do : ;done
  echo "$REPLY" 
} # Prints foo!

Best Answer

Related Solutions

Shell – Process Substitution and Pipe

Bash Pipes – Pipes vs Process Substitution

Related Question