Bash – Command substitution inside a function does not stop the script on a failure even if -e is set

bashshell-script

I have a following script sandbox.sh,

#!/bin/bash
set -eu -o pipefail -E

function func1() {
  echo "FUNC1"
  exit 1
}

function func2() {
  local ret
  ret=$(func1)
  echo $ret
  echo "(func2)This line shouldn't be reached:'${?}'" >&2
}

var=$(func1) # The Line
echo "main:This line shouldn't be reached:'${var}':'${?}'" >&2

(GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu))

This stops executing expectedly,

$ bash -eu sandbox.sh 
$

However, if I modify "The Line" to var=$(func2) to call func1 through func2, it will give me following output

$ bash sandbox.sh 
(func2)This line shouldn't be reached:'0'
main:This line shouldn't be reached:'FUNC1':'0'
$

To me, it seems command substitution behaves differently when it is placed inside a function, but I don't see why bash is designed so.
Also it is a quite possible situation where a function's output is used by another and such a difference is confusing.

NOTE: If I rewrite func2 like following,

function func2() {
  func1
}

The script stops at The Line.
However, programmers quite often want to manipulate output from func1, I believe.

Best Answer

This is all perfectly understandable if we step through slowly. Some more logging is required, so run bash with the -x parameter, which will echo commands just before bash executes them, prefixed by + .

First run

$ bash -x sandbox.sh; echo $?
+ set -eu -o pipefail -E
++ func1
++ echo FUNC1
++ exit 1
+ var=FUNC1
1

-e says this shell will exit immediately a command returns non-zero. Crucially though, you run func1 in a subshell (using $( )). The trace above shows this fact by using two +s as the prefix (++ ).
The subshell spits out FUNC1 on stdout, and then exits with return code 1.
- Note: -e is off inside this subshell. The reason the subshell quit was due to the exit command, not -e. You can't really tell this due to the way func1 is written.
Back in the first shell, we assign FUNC1 to the variable var. However, the exit code of this assignment command is the exit code of the last command substitution. Bash sees this failure (i.e., non-zero exit code), and quits.

To quote the manual's SIMPLE COMMAND EXPANSION section:

If one of the expansions contained a command substitution, the exit status of the command is the exit status of the last command substitution performed.

Second run

Exactly the same explanation as the first run. We note again that the -e is not in effect inside the subshell. This time however, there is a material difference — we get a clearer view of what is happening.

The exit code of func2 is the exit code of its last command
That echo always succeeds.
func2 always succeeds
The assignment always succeeds.

-e has no effect.

`shopt -s inherit_errexit` ?

This will turn on -e in subshells. It is however a difficult bedfellow. It does not guarantee we assert when a command fails.

Consider this:

set -e
shopt -s inherit_errexit

f() { echo a; (exit 22); echo b; }

echo "f says [$(f)] $?"
echo byee

This time the command substitution is part of an echo, rather than an assignment, and we get

+ set -e
+ shopt -s inherit_errexit
++ f
++ echo a
++ exit 22
+ echo 'f says [a] 22'
f says [a] 22
+ echo byee
byee

The subshell sees a command that fails with exit code 22. Since -e is in effect, the shell exits with code 22 (echo b does not execute).
Back in the first shell, echo gets a as the output of f, and 22 as the exit code of the subshell
Thing is, unlike an assignment, the exit code of the echo is zero.

Version

$ bash --version
GNU bash, version 5.0.17(1)-release (x86_64-redhat-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Related Solutions

bash script – Why ‘set -e’ Doesn’t Stop on ‘… && …’ Command

This is documented behavior. The bash(1) man page says, for set -e,

The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !.
[Emphasis added.]

And the POSIX Shell Command Language Specification confirms that this is the correct behavior:

The -e setting shall be ignored when executing the compound list following the while, until,if, or elif reserved word, a pipeline beginning with the ! reserved word, or any command of an AND-OR list other than the last.

and Section 2.9.3 Lists of that document defines

An AND-OR list is a sequence of one or more pipelines separated by the operators "&&" and "||" .

Bash – command substitution inside awk

First, a disclaimer: Please don't parse the output of find. The code below is for illustration only, of how to incorporate command substitution into an Awk script in such a way that the commands can act upon pieces of Awk's input.

To actually do a line count (wc -l) on each file found with find (which is the example use case), just use:

 find . -type f -name '*txt' -exec wc -l {} +

However, to answer your questions as asked:

Q1

To answer your Q1:

Q1: is there a way to perform command substitution inside awk?

Of course there is a way, from man awk :

command | getline [var] Run command piping the output either into $0 or var, as above, and RT.

So ( Watch the quoting !! ):

find . | awk '/txt$/{"wc -l <\"" $NF "\"|cut -f1" | getline(nl); print(nl)}'

Please note that the string built and therefore the command executed is

wc -l <file

To avoid the filename printing of wc.

Well, I avoided a needed file "close" for that command (safe for a couple of files, but technically incorrect). You actually need to do:

find . | awk '/txt$/{
                       comm="wc -l <\"" $NF "\" | cut -f1"
                       comm | getline nl;
                       close (comm);
                       print nl 
                    }'

That works for older awk versions also.
Remember to avoid the printing of a dot . with find ., that makes the code fail as a dot is a directory and wc can not use that.

Or either, avoid the use of dot values:

find . | awk '/txt$/ && $NF!="." {  comm="wc -l <\"" $NF "\" | cut -f1"
                                    comm | getline nl;
                                    close (comm);
                                    print nl 
                                 }'

You can convert that to a one-liner, but it will look quite ugly, Me thinks.

Q2

As for your second question:

Q2: why is the first incantation above silently failing and is simply printing the filenames instead?

Because awk does not parse correctly shell commands. It understand the command as:

nl = $(wc -l $NF)
nl --> variable
$ --> pointer to a field
wc --> variable (that has zero value here)
-  --> minus sign
l  --> variable (that has a null string)
$  --> Pointer to a field
NF --> Last field

Then, l $NF becomes the concatenation of null and the text inside the las field (a name of a file). The expansion of such text as a numeric variable is the numeric value 0

For awk, it becomes:

nl = $( wc -l $NF)
nl = $ ( 0 - 0 )

Which becomes just $0, the whole line input, which is (for the simple find of above) only the file name.

So, all the above will only print the filename (well, technically, the whole line).