Bash Error-Handling – Capture Exit Code and Handle Errors in Process Substitution

basherror handlingexitprocess-substitution

I have a script that parses file names into an array using the following method taken from a Q&A on SO:

unset ARGS
ARGID="1"
while IFS= read -r -d $'\0' FILE; do
    ARGS[ARGID++]="$FILE"
done < <(find "$@" -type f -name '*.txt' -print0)

This works great and handles all types of filename variations perfectly. Sometimes, however, I will pass a non-existing file to the script, e.g:

$ findscript.sh existingfolder nonexistingfolder
find: `nonexistingfile': No such file or directory
...

Under normal circumstances I would have the script capture the exit code with something like RET=$? and use it to decide how to proceed. This does not seem to work with the process substitution above.

What's the correct procedure in cases like this? How can I capture the return code? Are there other more suitable ways to determine if something went wrong in the substituted process?

Best Answer

You can pretty easily get the return from any subshelled process by echoing its return out over its stdout. The same is true of process substitution:

while IFS= read -r -d $'\0' FILE || 
    ! return=$FILE
do    ARGS[ARGID++]="$FILE"
done < <(find . -type f -print0; printf "$?")

If I run that then the very last line - (or \0 delimited section as the case may be) is going to be find's return status. read is going to return 1 when it gets an EOF - so the only time $return is set to $FILE is for the very last bit of information read in.

I use printf to keep from adding an extra \newline - this is important because even a read performed regularly - one in which you do not delimit on \0 NULs - is going to return other than 0 in cases when the data it has just read in does not end in a \newline. So if your last line does not end with a \newline, the last value in your read in variable is going to be your return.

Running command above and then:

echo "$return"

OUTPUT

And if I alter the process substitution part...

...
done < <(! find . -type f -print0; printf "$?")
echo "$return"

OUTPUT

A more simple demonstration:

printf \\n%s list of lines printed to pipe |
while read v || ! echo "$v"
do :; done

OUTPUT

pipe

And in fact, so long as the return you want is the last thing you write to stdout from within the process substitution - or any subshelled process from which you read in this way - then $FILE is always going to be the return status you want when it is through. And so the || ! return=... part is not strictly necessary - it is used to demonstrate the concept only.

Related Solutions

Default exit code when process is terminated

Processes can call the _exit() system call (on Linux, see also exit_group()) with an integer argument to report an exit code to their parent. Though it's an integer, only the 8 least significant bits are available to the parent (exception to that is when using waitid() or handler on SIGCHLD in the parent to retrieve that code, though not on Linux).

The parent will typically do a wait() or waitpid() to get the status of their child as an integer (though waitid() with somewhat different semantics can be used as well).

On Linux and most Unices, if the process terminated normally, bits 8 to 15 of that status number will contain the exit code as passed to exit(). If not, then the 7 least significant bits (0 to 6) will contain the signal number and bit 7 will be set if a core was dumped.

perl's $? for instance contains that number as set by waitpid():

$ perl -e 'system q(kill $$); printf "%04x\n", $?'
000f # killed by signal 15
$ perl -e 'system q(kill -ILL $$); printf "%04x\n", $?'
0084 # killed by signal 4 and core dumped
$ perl -e 'system q(exit $((0xabc))); printf "%04x\n", $?'
bc00 # terminated normally, 0xbc the lowest 8 bits of the status

Bourne-like shells also make the exit status of the last run command in their own $? variable. However, it does not contain directly the number returned by waitpid(), but a transformation on it, and it's different between shells.

What's common between all shells is that $? contains the lowest 8 bits of the exit code (the number passed to exit()) if the process terminated normally.

Where it differs is when the process is terminated by a signal. In all cases, and that's required by POSIX, the number will be greater than 128. POSIX doesn't specify what the value may be. In practice though, in all Bourne-like shells that I know, the lowest 7 bits of $? will contain the signal number. But, where n is the signal number,

in ash, zsh, pdksh, bash, the Bourne shell, $? is 128 + n. What that means is that in those shells, if you get a $? of 129, you don't know whether it's because the process exited with exit(129) or whether it was killed by the signal 1 (HUP on most systems). But the rationale is that shells, when they do exit themselves, by default return the exit status of the last exited command. By making sure $? is never greater than 255, that allows to have a consistent exit status:
```
$ bash -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
bash: line 1: 16720 Terminated              sh -c "kill \$\$"
8f # 128 + 15
$ bash -c 'sh -c "kill \$\$"; exit'; printf '%x\n' "$?"
bash: line 1: 16726 Terminated              sh -c "kill \$\$"
8f # here that 0x8f is from a exit(143) done by bash. Though it's
   # not from a killed process, that does tell us that probably
   # something was killed by a SIGTERM
```
ksh93, $? is 256 + n. That means that from a value of $? you can differentiate between a killed and non-killed process. Newer versions of ksh, upon exit, if $? was greater than 255, kills itself with the same signal in order to be able to report the same exit status to its parent. While that sounds like a good idea, that means that ksh will generate an extra core dump (potentially overwriting the other one) if the process was killed by a core generating signal:
```
$ ksh -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
ksh: 16828: Terminated
10f # 256 + 15
$ ksh -c 'sh -c "kill -ILL \$\$"; exit'; printf '%x\n' "$?"
ksh: 16816: Illegal instruction(coredump)
Illegal instruction(coredump)
104 # 256 + 15, ksh did indeed kill itself so as to report the same
    # exit status as sh. Older versions of `ksh93` would have returned
    # 4 instead.
```
Where you could even say there's a bug is that ksh93 kills itself even if $? comes from a return 257 done by a function:
```
$ ksh -c 'f() { return "$1"; }; f 257; exit'
zsh: hangup     ksh -c 'f() { return "$1"; }; f 257; exit'
# ksh kills itself with a SIGHUP so as to report a 257 exit status
# to its parent
```
yash. yash offers a compromise. It returns 256 + 128 + n. That means we can also differentiate between a killed process and one that terminated properly. And upon exiting, it will report 128 + n without having to suicide itself and the side effects it can have.
```
$ yash -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
18f # 256 + 128 + 15
$ yash -c 'sh -c "kill \$\$"; exit'; printf '%x\n' "$?"
8f  # that's from a exit(143), yash was not killed
```

To get the signal from the value of $?, the portable way is to use kill -l:

$ /bin/kill 0
Terminated
$ kill -l "$?"
TERM

(for portability, you should never use signal numbers, only signal names)

On the non-Bourne fronts:

csh/tcsh and fish same as the Bourne shell except that the status is in $status instead of $? (note that zsh also sets $status for compatibility with csh (in addition to $?)).
rc: the exit status is in $status as well, but when killed by a signal, that variable contains the name of the signal (like sigterm or sigill+core if a core was generated) instead of a number, which is yet another proof of the good design of that shell.
es. the exit status is not a variable. If you care for it, you run the command as:
```
status = <={cmd}
```
which will return a number or sigterm or sigsegv+core like in rc.

Maybe for completeness, we should mention zsh's $pipestatus and bash's $PIPESTATUS arrays that contain the exit status of the components of the last pipeline.

And also for completeness, when it comes to shell functions and sourced files, by default functions return with the exit status of the last command run, but can also set a return status explicitly with the return builtin. And we see some differences here:

bash and mksh (since R41, a regression^Wchange apparently introduced intentionally) will truncate the number (positive or negative) to 8 bits. So for instance return 1234 will set $? to 210, return -- -1 will set $? to 255.
zsh and pdksh (and derivatives other than mksh) allow any signed 32 bit decimal integer (-2³¹ to 2³¹-1) (and truncate the number to 32bits).
ash and yash allow any positive integer from 0 to 2³¹-1 and return an error for any number out of that.
ksh93 for return 0 to return 320 set $? as is, but for anything else, truncate to 8 bits. Beware as already mentioned that returning a number between 256 and 320 could cause ksh to kill itself upon exit.
rc and es allow returning anything even lists.

Also note that some shells also use special values of $?/$status to report some error conditions that are not the exit status of a process, like 127 or 126 for command not found or not executable (or syntax error in a sourced file)...

Bash – How to Propagate Errors in Process Substitution

You could only work around that issue with that for example:

cat <(false || kill $$) <(echo ok)
other_command

The subshell of the script is SIGTERMd before the second command can be executed (other_command). The echo ok command is executed "sometimes": The problem is that process substitutions are asynchronous. There's no guarantee that the kill $$ command is executed before or after the echo ok command. It's a matter of the operating systems scheduling.

Consider a bash script like this:

#!/bin/bash
set -e
set -o pipefail
cat <(echo pre) <(false || kill $$) <(echo post)
echo "you will never see this"

The output of that script can be:

$ ./script
Terminated
$ echo $?
143           # it's 128 + 15 (signal number of SIGTERM)

Or:

$ ./script
Terminated
$ pre
post

$ echo $?
143

You can try it and after a few tries, you will see the two different orders in the output. In the first one the script was terminated before the other two echo commands could write to the file descriptor. In the second one the false or the kill command were probably scheduled after the echo commands.

Or to be more precisely: The system call signal() of the kill utillity that sends the the SIGTERM signal to the shells process was scheduled (or was delivered) later or earlier than the echo write() syscalls.

But however, the script stops and the exit code is not 0. It should therefore solve your issue.

Another solution is, of course, to use named pipes for this. But, it depends on your script how complex it would be to implement named pipes or the workaround above.

References:

Best Answer

OUTPUT

OUTPUT

OUTPUT

Related Solutions

Default exit code when process is terminated

Bash – How to Propagate Errors in Process Substitution

Related Question