The <(…)
construct creates a pipe. The pipe is passed via a file name like /dev/fd/63
, but this is a special kind of file: opening it really means duplicating file descriptor 63. (See the end of this answer for more explanations.)
Reading from a pipe is a destructive operation: once you've caught a byte, you can't throw it back. So your script needs to save the output from the pipe. You can use a temporary file (preferable if the input is large) or a variable (preferable if the input is small). With a temporary file:
tmp=$(mktemp)
cat <"$1" >"$tmp"
cat <"$tmp"
grep hello <"$tmp"
sed 's/hello/world/g' <"$tmp"
rm -f "$tmp"
(You can combine the two calls to cat
as tee <"$1" -- "$tmp"
.) With a variable:
tmp=$(cat)
printf "%s\n"
printf "%s\n" "$tmp" | grep hello
printf "%s\n" "$tmp" | sed 's/hello/world/g'
Note that command substitution $(…)
truncates all newlines at the end of the command's output. To avoid that, add an extra character and strip it afterwards.
tmp=$(cat; echo a); tmp=${tmp%a}
printf "%s\n"
printf "%s\n" "$tmp" | grep hello
printf "%s\n" "$tmp" | sed 's/hello/world/g'
By the way, don't forget the double quotes around variable substitutions.
Well, there are many aspects to it.
File descriptors
For each process, the kernel maintains a table of open files (well, it might be implemented differently, but since you are not able to see it anyways, you can just assume it's a simple table). That table contains information about which file it is/where it can be found, in which mode you opened it, at which position you are currently reading/writing, and whatever else is needed to actually perform I/O operations on that file. Now the process never gets to read (or even write) that table. When the process opens a file, it gets back a so-called file descriptor. Which is simply an index into the table.
The directory /dev/fd
and its content
On Linux dev/fd
is actually a symbolic link to /proc/self/fd
. /proc
is a pseudo file system in which the kernel maps several internal data structures to be accessed with the file API (so they just look like regular files/directories/symlinks to the programs). Especially there's information about all processes (which is what gave it the name). The symbolic link /proc/self
always refers to the directory associated with currently running process (that is, the process requesting it; different processes therefore will see different values). In the process's directory, there's a subdirectory fd
which for each open file contains a symbolic link whose name is just the decimal representation of file descriptor (the index into the process's file table, see previous section), and whose target is the file it corresponds to.
File descriptors when creating child processes
A child process is created by a fork
. A fork
makes a copy of the file descriptors, which means that the child process created has the very same list of open files as the parent process does. So unless one of the open files is closed by the child, accessing an inherited file descriptor in the child will access the very same file as accessing the original file descriptor in the parent process.
Note that after a fork, you initially have two copies of the same process which differ only in the return value from the fork call (the parent gets the PID of the child, the child gets 0). Normally, a fork is followed by an exec
to replace one of the copies by another executable. The open file descriptors survive that exec. Note also that before the exec, the process can do other manipulations (like closing files that the new process should not get, or opening other files).
Unnamed pipes
An unnamed pipe is just a pair of file descriptors created on request by the kernel, so that everything written to the first file descriptor is passed to the second. The most common use is for the piping construct foo | bar
of bash
, where the standard output of foo
is replaced by the write part of the pipe, and the standard input is replaces by the read part. Standard input and standard output are just the first two entries in the file table (entry 0 and 1; 2 is standard error), and therefore replacing it means just rewriting that table entry with the data corresponding to the other file descriptor (again, the actual implementation may differ). Since the process cannot access the table directly, there's a kernel function to do that.
Process substitution
Now we have everything together to understand how the process substitution works:
- The bash process creates an unnamed pipe for communication between the two processes created later.
- Bash forks for the
echo
process. The child process (which is an exact copy of the original bash
process) closes the reading end of the pipe and replaces its own standard output with the writing end of the pipe. Given that echo
is a shell builtin, bash
might spare itself the exec
call, but it doesn't matter anyway (the shell builtin might also be disabled, in which case it execs /bin/echo
).
- Bash (the original, parent one) replaces the expression
<(echo 1)
by the pseudo file link in /dev/fd
referring to the reading end of the unnamed pipe.
- Bash execs for the PHP process (note that after the fork, we are still inside [a copy of] bash). The new process closes the inherited write end of the unnamed pipe (and does some other preparatory steps), but leaves the read end open. Then it executed PHP.
- The PHP program receives the name in
/dev/fd/
. Since the the corresponding file descriptor is still open, it still corresponds to the reading end of the pipe. Therefore if the PHP program opens the given file for reading, what it actually does is to create a second
file descriptor for the reading end of the unnamed pipe. But that's no problem, it could read from either.
- Now the PHP program can read the reading end of the pipe through the new file descriptor, and thus receive the standard output of the
echo
command which goes to the writing end of the same pipe.
Best Answer
You could only work around that issue with that for example:
The subshell of the script is
SIGTERM
d before the second command can be executed (other_command
). Theecho ok
command is executed "sometimes": The problem is that process substitutions are asynchronous. There's no guarantee that thekill $$
command is executed before or after theecho ok
command. It's a matter of the operating systems scheduling.Consider a bash script like this:
The output of that script can be:
Or:
You can try it and after a few tries, you will see the two different orders in the output. In the first one the script was terminated before the other two
echo
commands could write to the file descriptor. In the second one thefalse
or thekill
command were probably scheduled after theecho
commands.Or to be more precisely: The system call
signal()
of thekill
utillity that sends the theSIGTERM
signal to the shells process was scheduled (or was delivered) later or earlier than the echowrite()
syscalls.But however, the script stops and the exit code is not 0. It should therefore solve your issue.
Another solution is, of course, to use named pipes for this. But, it depends on your script how complex it would be to implement named pipes or the workaround above.
References: