Bash – How to Determine Exit Code Source After Command Execution

bashexit-status

My bash knowledge is a bit rusty (and hasn't been very solid before as well), so I seem to be unable to find an answer to the following question:

As the heading says, I'd like to know how I can determine whether a non-zero exit code after command execution has been set by bash (meaning a real error) or by the command (possibly indicating an error, dependent on the command and my purpose).

For example, let's have a look at the following very simple script:

#!/bin/bash

string='abc'
grep 'd' <<< "$string"
echo $?

This outputs 1, which is expected after having read grep's manual (excerpt, shortening mine):

EXIT STATUS
Normally the exit status is 0 if a line is selected, 1 if no lines were selected, and 2 if an error occurred. […]

After having read the respective section of bash's manual, I am having a problem (excerpt, shortening and emphasis mine):

EXIT STATUS

[…]

If a command is not found, the child process created to execute it
returns a status of 127. If a command is found but is not executable,
the return status is 126.

If a command fails because of an error during expansion or
redirection, the exit status is greater than zero.

Shell builtin commands return a status of 0 (true) if successful, and
non-zero (false) if an error occurs while they execute. All builtins
return an exit status of 2 to indicate incorrect usage, generally
invalid options or missing arguments.

[…]

My problem is the emphasized statement.

My scripts generally need to treat real errors (like lack of permissions, needed programs not being available, resource exhaustion etc.) specially, but in my example above, it is not an error in the above-mentioned sense when grep does not select a line; instead, it just means that its input did not contain a matching character sequence.

However, if I take the section from bash's manual literally, it could be bash itself which may have set the exit status of 1. From that section, we know what happens if a command could not be found (exit status 127) or is not executable (exit status 126).

The next statement in that section, as I understand it, means that every other error can be mapped to any exit status in the inclusive range [1, 255] by bash. Notably, it can be mapped to exit status 1. I am considering that a major problem because I believe that there is a vast number of errors besides "command not found" or "command not executable". For example, command execution could be prevented by memory exhaustion, file handle exhaustion, timeouts due to disk read errors, and so on.

In contrast to the "grep could not find a matching line" error, these are real severe errors which mostly must cause an email being sent to the administrator for immediate action.

But now it seems that I can't differentiate between the two sorts of errors (non-zero exit status set by executed command vs. non-zero exit status set by bash after having tried to execute the command).

Could anybody point me to a reasonable solution?

Similar questions

During my research, I have come across a lot of similar questions. However, to my best understanding, nobody had the exact same problem.

Instead, most people just wanted to suppress a non-zero exit code returned by a command (applied to my example, they would have wanted to have exit status 0 instead of 1 when grep did not select a line), and were given a solution similar to command || true.

While this might be acceptable for them, it is not a solution to me, because it also would suppress the real errors mentioned above. For example, consider the following:

root@cerberus:~/scripts# { ThisProgramDoesNotExist 2>/dev/null || true; } && { echo "Gotcha!"; }
Gotcha!
root@cerberus:~/scripts#

This demonstrates how that solution suppresses not only non-zero exit statuses (or is it "stati"?) from an executed command, but also severe errors reported by bash when failing to start a command. This is a no-go in most of my scripts.

Best Answer

You can't tell. All you get is a single value between 0 and 255, which is 0 if everything went well and nonzero otherwise.

If you want to treat some nonzero statuses as successes, be sure that the command in question can't fail for other reasons such as a redirection. Break up the command so that different kinds of failures happen in different commands or result in different statuses.

For example, if you need to know whether an error comes from a redirection, either do the redirection in a separate command, or do it separately over a block.

Combined status:

mycommand <foo
status=$?
if [ $status -ne 0 ]; then echo "Either mycommand failed or <foo failed"; fi

Separate statuses, but no way to avoid running the command if the redirection fails:

{
  mycommand
  command_status=$?
} <foo
redirection_status=$?
if [ $command_status -ne 0 ]; then echo "mycommand failed"; fi
if [ $redirection_status -ne 0 ]; then echo "<foo failed"; fi

Do the redirection first. Note that being able to react to the failure of the redirection in this way is a bash feature. POSIX shells, including bash in POSIX mode, exit if a redirection on the exec builtin fails.

exec 3<&1         # Save stdin to file descriptor 3
exec <foo         # bash keeps going if the redirection fails
redirection_status=$?
mycommand
command_status=$?
exec <&3          # Restore stdin
if [ $command_status -ne 0 ]; then echo "mycommand failed"; fi
if [ $redirection_status -ne 0 ]; then echo "<foo failed"; fi

Do the redirection fails, in a subshell to contain the failure of the redirection and not have to restore the file descriptor afterwards.

(
  exec <foo || exit $?     # In POSIX sh, "|| exit $?" is redundant.
  mycommand
  command_status=$?
  if [ $command_status -ne 0 ]; then echo "mycommand failed"; fi
)
redirection_status=$?
if [ $redirection_status -ne 0 ]; then echo "<foo failed and mycommand didn't run"; fi

If you need to know whether an error comes from some other expansion, do the expansion separately and save its result.

Saving one argument: instead of `mycommand "$(…)", save the result of the expansion first.

foo=$(…) && mycommand "$foo"

More generally:

foo=$(…)
command_substitution_status=$?
mycommand "$foo"
mycommand_status=$?

Note that if an assignment contains multiple command substitutions, its status is the status of the last substitution: the status is 0 if the last substitution succeeds, even if earlier ones failed.

foo=$(…)
foo_status=$?
bar=$(…)
bar_status=$?
mycommand "$foo" "$bar"
mycommand_status=$?

To save multiple arguments, use an array, or the positional parameters inside a function.

args=()
foo=$(…)
foo_status=$?
args+=(-x "$foo")
bar=$(…)
bar_status=$?
args+=(-y "$bar")
mycommand "${args[@]}"

Solution

If you are running Bash 4.4 or later, you can use the shopt option inherit_errexit to do just that. You can check compatibility from within Bash using echo $BASH_VERSION.

Here is the shebang you would use if Bash 4.4 or later were installed and came before /bin in your $PATH:

#!/usr/bin/env -S bash -euET -o pipefail -O inherit_errexit

The -S is there to coax Linux’s env into accepting more than one argument for bash, as kindly pointed out by @UVV and explained further on StackOverflow.

Background

inherit_errexit is an option to shopt, while the rest of the arguments are options to set. In most modern iterations, they can be passed directly to bash when invoking the shell.

Let’s review the options you have already been using:

-u/-o nounset, as the name ambiguously hints, disallows dereferencing of variables that have not been set; e.g., $IJUSTMADETHISUP.
-e/-o errexit does some of what you are requesting: it causes directly called shell commands with nonzero return values to cause the shell to exit entirely.
-o pipefail is needed to extend this to commands whose output is redirected with an I/O pipe |.

Now for the options I’ve added:

-O inherit_errexit further extends this functionality (exiting on nonzero status code) to commands called from within subshells $(...).
The -E/-o errtrace and -T/-o functrace options are there for the comparatively rare case that you use trap to perform an action when the shell receives a signal. These two options extend signal handlers to the inner bodies of shell functions for ERR signals and DEBUG/RETURN signals, respectively.

Best Answer

Related Solutions

Bash Shell Script – How to Exit on Backtick Failure Similar to Pipefail

Solution

Background

See also

Bash – List Only Successfully Run Commands in History

Related Question