Shell – How to get further information about the origin of an exit code

error handlingexitexit-statusshell-script

Sometimes I need to maintain programs that invoke shell scripts which invokes other programs and scripts. Therefore, when the main shell script ends with exit code 126, it is a struggle to find out which of the invoked scripts and commands set that exit code.

Is there a way to see which command was the reason for the exit code to make it easier to check for its permissions?

Best Answer

If on Linux, you could run the command under strace -fe process to know which process did an exit_group(126) and what command it (or any of its parent if it didn't execute anything itself) executed last before doing that:

$ strace -fe process sh -c 'env sh -c /; exit'
execve("/bin/sh", ["sh", "-c", "env sh -c /; exit"], [/* 53 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x7f24713b1700) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f24713b19d0) = 26325
strace: Process 26325 attached
[pid 26324] wait4(-1,  <unfinished ...>
[pid 26325] execve("/usr/bin/env", ["env", "sh", "-c", "/"], [/* 53 vars */]) = 0
[pid 26325] arch_prctl(ARCH_SET_FS, 0x7fbdb4e2c700) = 0
[pid 26325] execve("/bin/sh", ["sh", "-c", "/"], [/* 53 vars */]) = 0
[pid 26325] arch_prctl(ARCH_SET_FS, 0x7fef90b3b700) = 0
[pid 26325] clone(strace: Process 26326 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fef90b3b9d0) = 26326
[pid 26325] wait4(-1,  <unfinished ...>
[pid 26326] execve("/", ["/"], [/* 53 vars */]) = -1 EACCES (Permission denied)
sh: 1: /: Permission denied
[pid 26326] exit_group(126)             = ?
[pid 26326] +++ exited with 126 +++
[pid 26325] <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 126}], 0, NULL) = 26326
[pid 26325] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=26326, si_uid=10031, si_status=126, si_utime=0, si_stime=0} ---
[pid 26325] exit_group(126)             = ?
[pid 26325] +++ exited with 126 +++
<... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 126}], 0, NULL) = 26325
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=26325, si_uid=10031, si_status=126, si_utime=0, si_stime=0} ---
exit_group(126)                         = ?
+++ exited with 126 +++

Above, that was process 26326 that first exited with 126, that was because it attempted to execute /. It was a child of process 26325 that last executed sh -c /.

If those scripts are bash scripts or if they are sh scripts and sh happens to be bash on your system, you could do:

$ env SHELLOPTS=xtrace \
      BASH_XTRACEFD=7 7>&2 \
      PS4='[$?][$BASHPID|${BASH_SOURCE:-$BASH_EXECUTION_STRING}|$LINENO]+ ' \ 
    sh -c 'env sh -c /; exit'
[0][30625|env sh -c /; exit|0]+ env sh -c /
[0][30626|/|0]+ /
sh: /: Is a directory
[126][30625|env sh -c /; exit|0]+ exit

That doesn't tell us exactly what process exited with 126 but could give you enough clue.

We use BASH_TRACEFD=7 7>&2 so that the traces are output on the original stderr, even when stderr is redirected within the scripts. Otherwise those trace messages could affect the behaviour of the scripts if they do things like (....) 2>&1 | .... That assumes those scripts don't explicitly use or close fd 7 themselves (that's would be unlikely, a lot more unlikely than them redirecting stderr).