Bash – How to Propagate Errors in Process Substitution

bashprocess-substitution

I want my shell scripts to fail whenever a command executed with them fails.

Typically I do that with:

set -e
set -o pipefail

(typically I add set -u also)

The thing is that none of the above works with process substitution. This code prints "ok" and exit with return code = 0, while I would like it to fail:

#!/bin/bash -e
set -o pipefail
cat <(false) <(echo ok)

Is there anything equivalent to "pipefail" but for process substitution? Any other way to passing to a command the output of commands as it they were files, but raising an error whenever any of those programs fails?

A poor's man solution would be detecting if those commands write to stderr (but some commands write to stderr in sucessful scenarios).

Another more posix compliant solution would be using named pipes, but I need to lauch those commands-that-use-process-substitution as oneliners built on the fly from compiled code, and creating named pipes would complicate things (extra commands, trapping error for deleting them, etc.)

Best Answer

You could only work around that issue with that for example:

cat <(false || kill $$) <(echo ok)
other_command

The subshell of the script is SIGTERMd before the second command can be executed (other_command). The echo ok command is executed "sometimes": The problem is that process substitutions are asynchronous. There's no guarantee that the kill $$ command is executed before or after the echo ok command. It's a matter of the operating systems scheduling.

Consider a bash script like this:

#!/bin/bash
set -e
set -o pipefail
cat <(echo pre) <(false || kill $$) <(echo post)
echo "you will never see this"

The output of that script can be:

$ ./script
Terminated
$ echo $?
143           # it's 128 + 15 (signal number of SIGTERM)

Or:

$ ./script
Terminated
$ pre
post

$ echo $?
143

You can try it and after a few tries, you will see the two different orders in the output. In the first one the script was terminated before the other two echo commands could write to the file descriptor. In the second one the false or the kill command were probably scheduled after the echo commands.

Or to be more precisely: The system call signal() of the kill utillity that sends the the SIGTERM signal to the shells process was scheduled (or was delivered) later or earlier than the echo write() syscalls.

But however, the script stops and the exit code is not 0. It should therefore solve your issue.

Another solution is, of course, to use named pipes for this. But, it depends on your script how complex it would be to implement named pipes or the workaround above.

References:

Related Solutions

Bash Reuse Process Substitution File

The <(…) construct creates a pipe. The pipe is passed via a file name like /dev/fd/63, but this is a special kind of file: opening it really means duplicating file descriptor 63. (See the end of this answer for more explanations.)

Reading from a pipe is a destructive operation: once you've caught a byte, you can't throw it back. So your script needs to save the output from the pipe. You can use a temporary file (preferable if the input is large) or a variable (preferable if the input is small). With a temporary file:

tmp=$(mktemp)
cat <"$1" >"$tmp"
cat <"$tmp"
grep hello <"$tmp"
sed 's/hello/world/g' <"$tmp"
rm -f "$tmp"

(You can combine the two calls to cat as tee <"$1" -- "$tmp".) With a variable:

tmp=$(cat)
printf "%s\n"
printf "%s\n" "$tmp" | grep hello
printf "%s\n" "$tmp" | sed 's/hello/world/g'

Note that command substitution $(…) truncates all newlines at the end of the command's output. To avoid that, add an extra character and strip it afterwards.

tmp=$(cat; echo a); tmp=${tmp%a}
printf "%s\n"
printf "%s\n" "$tmp" | grep hello
printf "%s\n" "$tmp" | sed 's/hello/world/g'

By the way, don't forget the double quotes around variable substitutions.

Bash – How Process Substitution is Implemented

Well, there are many aspects to it.

File descriptors

For each process, the kernel maintains a table of open files (well, it might be implemented differently, but since you are not able to see it anyways, you can just assume it's a simple table). That table contains information about which file it is/where it can be found, in which mode you opened it, at which position you are currently reading/writing, and whatever else is needed to actually perform I/O operations on that file. Now the process never gets to read (or even write) that table. When the process opens a file, it gets back a so-called file descriptor. Which is simply an index into the table.

The directory /dev/fd and its content

On Linux dev/fd is actually a symbolic link to /proc/self/fd. /proc is a pseudo file system in which the kernel maps several internal data structures to be accessed with the file API (so they just look like regular files/directories/symlinks to the programs). Especially there's information about all processes (which is what gave it the name). The symbolic link /proc/self always refers to the directory associated with currently running process (that is, the process requesting it; different processes therefore will see different values). In the process's directory, there's a subdirectory fd which for each open file contains a symbolic link whose name is just the decimal representation of file descriptor (the index into the process's file table, see previous section), and whose target is the file it corresponds to.

File descriptors when creating child processes

A child process is created by a fork. A fork makes a copy of the file descriptors, which means that the child process created has the very same list of open files as the parent process does. So unless one of the open files is closed by the child, accessing an inherited file descriptor in the child will access the very same file as accessing the original file descriptor in the parent process.

Note that after a fork, you initially have two copies of the same process which differ only in the return value from the fork call (the parent gets the PID of the child, the child gets 0). Normally, a fork is followed by an exec to replace one of the copies by another executable. The open file descriptors survive that exec. Note also that before the exec, the process can do other manipulations (like closing files that the new process should not get, or opening other files).

Unnamed pipes

An unnamed pipe is just a pair of file descriptors created on request by the kernel, so that everything written to the first file descriptor is passed to the second. The most common use is for the piping construct foo | bar of bash, where the standard output of foo is replaced by the write part of the pipe, and the standard input is replaces by the read part. Standard input and standard output are just the first two entries in the file table (entry 0 and 1; 2 is standard error), and therefore replacing it means just rewriting that table entry with the data corresponding to the other file descriptor (again, the actual implementation may differ). Since the process cannot access the table directly, there's a kernel function to do that.

Process substitution

Now we have everything together to understand how the process substitution works:

The bash process creates an unnamed pipe for communication between the two processes created later.
Bash forks for the echo process. The child process (which is an exact copy of the original bash process) closes the reading end of the pipe and replaces its own standard output with the writing end of the pipe. Given that echo is a shell builtin, bash might spare itself the exec call, but it doesn't matter anyway (the shell builtin might also be disabled, in which case it execs /bin/echo).
Bash (the original, parent one) replaces the expression <(echo 1) by the pseudo file link in /dev/fd referring to the reading end of the unnamed pipe.
Bash execs for the PHP process (note that after the fork, we are still inside [a copy of] bash). The new process closes the inherited write end of the unnamed pipe (and does some other preparatory steps), but leaves the read end open. Then it executed PHP.
The PHP program receives the name in /dev/fd/. Since the the corresponding file descriptor is still open, it still corresponds to the reading end of the pipe. Therefore if the PHP program opens the given file for reading, what it actually does is to create a second file descriptor for the reading end of the unnamed pipe. But that's no problem, it could read from either.
Now the PHP program can read the reading end of the pipe through the new file descriptor, and thus receive the standard output of the echo command which goes to the writing end of the same pipe.

Best Answer

Related Solutions

Bash Reuse Process Substitution File

Bash – How Process Substitution is Implemented

Related Question