POSIX Shell – Portable Way to Achieve Process Substitution

pipeposixprocess-substitutionshell

Some shells, like bash, support Process Substitution which is a way to present process output as a file, like this:

$ diff <(sort file1) <(sort file2)

However, this construct isn't POSIX and, therefore, not portable. How can process substitution be achieved in a POSIX-friendly manner (i.e. one which works in /bin/sh) ?

note: the question isn't asking how to diff two sorted files – that is only a contrived example to demonstrate process substitution!

Best Answer

That feature was introduced by ksh (first documented in ksh86) and was making use of the /dev/fd/n feature (added independently in some BSDs and AT&T systems earlier). In ksh and up to ksh93u, it wouldn't work unless your system had support for /dev/fd/n. zsh, bash and ksh93u+ and above can make use of temporary named pipes (named pipes added in System III) where /dev/fd/n are not available.

On systems where /dev/fd/n is available (POSIX doesn't specify those), you can do process substitution (e.g., diff <(cmd1) <(cmd2)) yourself with:

{
  cmd1 4<&- | {
    # in here fd 3 points to the reading end of the pipe
    # from cmd1, while fd 0 has been restored from the original
    # stdin (saved on fd 4, now closed as no longer needed)

    cmd2 3<&- | diff /dev/fd/3 -

  } 3<&0 <&4 4<&- # restore the original stdin for cmd2

} 4<&0 # save a copy of stdin for cmd2

However that doesn't work with ksh93 on Linux as there, shell pipes are implemented with socketpairs instead of pipes and opening /dev/fd/3 where fd 3 points to a socket doesn't work on Linux.

Though POSIX doesn't specify /dev/fd/n, it does specify named pipes. Named pipes work like normal pipes except that you can access them from the file system. The issue here is that you have to create temporary ones and clean up afterwards, which is hard to do reliably, especially considering that POSIX has no standard mechanism (like a mktemp -d as found on some systems) to create temporary files or directories, and signal handling (to clean-up upon hang-up or kill) is also hard to do portably.

You could do something like:

tmpfifo() (
  n=0
  until
    fifo=$1.$$.$n
    mkfifo -m 600 -- "$fifo" 2> /dev/null
  do
    n=$((n + 1))
    # give up after 20 attempts as it could be a permanent condition
    # that prevents us from creating fifos. You'd need to raise that
    # limit if you intend to create (and use at the same time)
    # more than 20 fifos in your script
    [ "$n" -lt 20 ] || exit 1
  done
  printf '%s\n' "$fifo"
)

cleanup() { rm -f -- "$fifo"; }

fifo=$(tmpfifo /tmp/fifo) || exit

cmd2 > "$fifo" & cmd1 | diff - "$fifo"

cleanup

(not taking care of signal handling here).

Related Solutions

Shell – Process Substitution and Pipe

A good way to grok the difference between them is to do a little experimenting on the command line. In spite of the visual similarity in use of the < character, it does something very different than a redirect or pipe.

Let's use the date command for testing.

$ date | cat
Thu Jul 21 12:39:18 EEST 2011

This is a pointless example but it shows that cat accepted the output of date on STDIN and spit it back out. The same results can be achieved by process substitution:

$ cat <(date)
Thu Jul 21 12:40:53 EEST 2011

However what just happened behind the scenes was different. Instead of being given a STDIN stream, cat was actually passed the name of a file that it needed to go open and read. You can see this step by using echo instead of cat.

$ echo <(date)
/proc/self/fd/11

When cat received the file name, it read the file's content for us. On the other hand, echo just showed us the file's name that it was passed. This difference becomes more obvious if you add more substitutions:

$ cat <(date) <(date) <(date)
Thu Jul 21 12:44:45 EEST 2011
Thu Jul 21 12:44:45 EEST 2011
Thu Jul 21 12:44:45 EEST 2011

$ echo <(date) <(date) <(date)
/proc/self/fd/11 /proc/self/fd/12 /proc/self/fd/13

It is possible to combine process substitution (which generates a file) and input redirection (which connects a file to STDIN):

$ cat < <(date)
Thu Jul 21 12:46:22 EEST 2011

It looks pretty much the same but this time cat was passed STDIN stream instead of a file name. You can see this by trying it with echo:

$ echo < <(date)
<blank>

Since echo doesn't read STDIN and no argument was passed, we get nothing.

Pipes and input redirects shove content onto the STDIN stream. Process substitution runs the commands, saves their output to a special temporary file and then passes that file name in place of the command. Whatever command you are using treats it as a file name. Note that the file created is not a regular file but a named pipe that gets removed automatically once it is no longer needed.

Bash Pipes – Pipes vs Process Substitution

Variables in a pipe never make it out of the pipe alive :)
Process substitution redirects the data to a file descriptor. Behind the scenes, that process is not the same as a | pipe. The following works, because it is all within the same pipe.

unset REPLY
cat test.txt | { 
  while read ;do : ;done
  echo "$REPLY" 
} # Prints foo!

Best Answer

Related Solutions

Shell – Process Substitution and Pipe

Bash Pipes – Pipes vs Process Substitution

Related Question