A good way to grok the difference between them is to do a little experimenting on the command line. In spite of the visual similarity in use of the <
character, it does something very different than a redirect or pipe.
Let's use the date
command for testing.
$ date | cat
Thu Jul 21 12:39:18 EEST 2011
This is a pointless example but it shows that cat
accepted the output of date
on STDIN and spit it back out. The same results can be achieved by process substitution:
$ cat <(date)
Thu Jul 21 12:40:53 EEST 2011
However what just happened behind the scenes was different. Instead of being given a STDIN stream, cat
was actually passed the name of a file that it needed to go open and read. You can see this step by using echo
instead of cat
.
$ echo <(date)
/proc/self/fd/11
When cat received the file name, it read the file's content for us. On the other hand, echo just showed us the file's name that it was passed. This difference becomes more obvious if you add more substitutions:
$ cat <(date) <(date) <(date)
Thu Jul 21 12:44:45 EEST 2011
Thu Jul 21 12:44:45 EEST 2011
Thu Jul 21 12:44:45 EEST 2011
$ echo <(date) <(date) <(date)
/proc/self/fd/11 /proc/self/fd/12 /proc/self/fd/13
It is possible to combine process substitution (which generates a file) and input redirection (which connects a file to STDIN):
$ cat < <(date)
Thu Jul 21 12:46:22 EEST 2011
It looks pretty much the same but this time cat was passed STDIN stream instead of a file name. You can see this by trying it with echo:
$ echo < <(date)
<blank>
Since echo doesn't read STDIN and no argument was passed, we get nothing.
Pipes and input redirects shove content onto the STDIN stream. Process substitution runs the commands, saves their output to a special temporary file and then passes that file name in place of the command. Whatever command you are using treats it as a file name. Note that the file created is not a regular file but a named pipe that gets removed automatically once it is no longer needed.
Process substitution is a feature that originated in the Korn shell in the 80s (in ksh86). At the time, it was only available on systems that had support for /dev/fd/<n>
files.
Later, the feature was added to zsh
(from the start: 1990) and bash
(in 1993). zsh
was using temporary named pipes to implement it, while bash
was using /dev/fd/<n>
where available and named pipes otherwise. zsh
switched to using /dev/fd/<n>
where available in 2.6-beta17
in 1996.
Support for process substitution via named pipes on systems without /dev/fd
was only added to ksh
in ksh93u+
in 2012. The public domain clone of ksh
doesn't support it.
To my knowledge, no other Bourne-like shell supports it (rc
, es
, fish
, non-Bourne-like shells support it but with a different syntax). yash
has a <(...)
construct, but that's for process redirection.
While quite useful, the feature was never standardized by POSIX. So, one can't expect to find it in sh
, so shouldn't use it in a sh
script.
Though the behaviour for <(...)
is unspecified in POSIX, (so there would be no harm in retaining it), bash
disables the feature when called as sh
or when called with POSIXLY_CORRECT=1
in its environment.
So, if you have a script that uses <(...)
, you should use a shell that supports the feature to interpret it like zsh
, bash
or AT&T ksh
(of course, you need to make sure the rest of the syntax of script is also compatible with that shell).
In any case:
cat <(cmd)
Can be written:
cmd | cat
Or just
cmd
For a command other than cat
(that needs to be passed data via a file given as argument), on systems with /dev/fd/x
, you can always do:
something | that-cmd /dev/stdin
Or if you need that-cmd
's stdin to be preserved:
{ something 3<&- | that-cmd /dev/fd/4 4<&0 <&3 3<&-; } 3<&0
Best Answer
Well, there are many aspects to it.
File descriptors
For each process, the kernel maintains a table of open files (well, it might be implemented differently, but since you are not able to see it anyways, you can just assume it's a simple table). That table contains information about which file it is/where it can be found, in which mode you opened it, at which position you are currently reading/writing, and whatever else is needed to actually perform I/O operations on that file. Now the process never gets to read (or even write) that table. When the process opens a file, it gets back a so-called file descriptor. Which is simply an index into the table.
The directory
/dev/fd
and its contentOn Linux
dev/fd
is actually a symbolic link to/proc/self/fd
./proc
is a pseudo file system in which the kernel maps several internal data structures to be accessed with the file API (so they just look like regular files/directories/symlinks to the programs). Especially there's information about all processes (which is what gave it the name). The symbolic link/proc/self
always refers to the directory associated with currently running process (that is, the process requesting it; different processes therefore will see different values). In the process's directory, there's a subdirectoryfd
which for each open file contains a symbolic link whose name is just the decimal representation of file descriptor (the index into the process's file table, see previous section), and whose target is the file it corresponds to.File descriptors when creating child processes
A child process is created by a
fork
. Afork
makes a copy of the file descriptors, which means that the child process created has the very same list of open files as the parent process does. So unless one of the open files is closed by the child, accessing an inherited file descriptor in the child will access the very same file as accessing the original file descriptor in the parent process.Note that after a fork, you initially have two copies of the same process which differ only in the return value from the fork call (the parent gets the PID of the child, the child gets 0). Normally, a fork is followed by an
exec
to replace one of the copies by another executable. The open file descriptors survive that exec. Note also that before the exec, the process can do other manipulations (like closing files that the new process should not get, or opening other files).Unnamed pipes
An unnamed pipe is just a pair of file descriptors created on request by the kernel, so that everything written to the first file descriptor is passed to the second. The most common use is for the piping construct
foo | bar
ofbash
, where the standard output offoo
is replaced by the write part of the pipe, and the standard input is replaces by the read part. Standard input and standard output are just the first two entries in the file table (entry 0 and 1; 2 is standard error), and therefore replacing it means just rewriting that table entry with the data corresponding to the other file descriptor (again, the actual implementation may differ). Since the process cannot access the table directly, there's a kernel function to do that.Process substitution
Now we have everything together to understand how the process substitution works:
echo
process. The child process (which is an exact copy of the originalbash
process) closes the reading end of the pipe and replaces its own standard output with the writing end of the pipe. Given thatecho
is a shell builtin,bash
might spare itself theexec
call, but it doesn't matter anyway (the shell builtin might also be disabled, in which case it execs/bin/echo
).<(echo 1)
by the pseudo file link in/dev/fd
referring to the reading end of the unnamed pipe./dev/fd/
. Since the the corresponding file descriptor is still open, it still corresponds to the reading end of the pipe. Therefore if the PHP program opens the given file for reading, what it actually does is to create asecond
file descriptor for the reading end of the unnamed pipe. But that's no problem, it could read from either.echo
command which goes to the writing end of the same pipe.