Bash Scripting – $! Not Set to PID in Process Substitution

bashscriptingshell-script

Bash 4.4.19(1)-release

I have below a simple script which is the basis for a logging app.
For various reasons I had to use process substitution.

The runner is the heart of the app and since process substitution is asynchronous, I have managed to get it to a good degree of coherence by the while loop. It works perfectly.

Unfortunately I found a case where it will not work: when I execute 'bash <filename> <function>'

So we need 2 files to reproduce.

Requirement:

Why does this happen?
How to modify my while loop to accommodate similar cases?

Simplified script is:

test.sh

#!/bin/bash

2sub() {
local in=$(cat); echo -e "$in";
}   
runner () {
 "${@}" 1> >(2sub)
 while [ -e /proc/$! ]; do sleep 0.1; done     # <<< LOOP WAIT FOR $!
}
remotesub() {
 bash ./test2.sh remotesub2
}

echo -e "running\n"; 
    runner bash ./test2.sh remotesub2 # LOOPS
    # runner remotesub # A POSSIBLE BYPASS/SOLUTION? But why?
echo -e "done!\n"

test2.sh

     remotesub2() {
         echo -e "'${BASH_VERSION}'"
         return 0
     }

     "$@"

Bypass:

As you can see from the script, there is a bypass for the problem, by including bash <filename> <function> inside a function, and passing the function to the runner. Why this works and not the direct way, I am sure somebody here knows.

Please shed some light on this issue and if there are some better ways to do the waiting loop in order to cover these cases.

Solution:

The best solution is what mosvy suggested. Thank you.
Using { "${@}"; } removes the need to package the commands in separate small functions which is a pain. Also after many hours of testing with my larger code, I came to the conclusion that careful killing of sub-processes makes this while [ -e /proc/$! ]; do sleep 0.1; done unnecessary. That line was replaced with wait $!;

Best Answer

If I understand you exactly, you're wondering why $! will be set to the PID of a process run inside >(...) only when that is part of the command line of a built-in command or function, but not when it's part of the command line of an external command.

Simplified example:

$ bash -c 'true > >(echo in=$BASHPID; sleep .1); echo psubst=$!'
psubst=12392
in=12392

$ bash -c '/bin/true > >(echo in=$BASHPID; sleep .1); echo psubst=$!'
in=12751
psubst=

That happens because in the case where an external command is used bash will fork a separate process to run it in, and the process running inside the >(...) will be run as a child of that process, and so as a grandchild of your script, completely outside of its control.

By the time the external command terminates, its child (if still running) will be adopted by pid 1 (init), and so any link that could still be used to retrieve its PID from your script is broken.

A workaround may be to use a wrapper function which will cause all the process substitutions from its command line to be run as children of your script, so their PIDs could be retrieved via pgrep -P "$$".

Also, putting the external command in {...} block and redirecting the ouput of the block also seems to work:

$ bash -c 'func(){ /bin/true; }; func > >(echo in=$BASHPID; sleep .1); echo psubst=$!'
in=3574
psubst=3574
$  bash -c '{ /bin/true; } > >(echo in=$BASHPID; sleep .1); echo psubst=$!'
in=3435
psubst=3435

Both workarounds rely on the way the current implementation works; eg. bash may decide one day to optimize away trivial group commands or functions, breaking these assumptions.

Notice that $! being set to the PID from the last process substitution is an undocumented feature, which also does not work in other shells than bash.

Related Solutions

Shell Script – Prevent SIGINT from Interrupting Function Calls and Child Processes

There are several ways you can cut off the effect of Ctrl+C:

Change the terminal setting so that it doesn't generate a signal.
Block the signal so that it is saved for later delivery, when the signal becomes unblocked.
Ignore the signal, or set a handler for it.
Run subprocesses in a background process group.

Since you want to detect that Ctrl+C has been pressed, ignoring the signal is out. You could change the terminal settings, but then you would need to write custom key processing code. Shells don't provide access to signal blocking.

You can however isolate subprocesses from receiving the signal automatically by running them in a separate process group. Interactive shells run background commands in a separate process group by default, but non-interactive shells run them in the same process group, and all processes in the foreground process group receive a signal from terminal events. To tell the shell to run background jobs in a separate process group, run set -m. Running setsid ping … is another way of forcing ping to run in a separate process group.

set -m
interrupted=
trap 'echo Interrupted, but ping may still be running' INT
set -m
ping … &
while wait; [ $? -ge 128 ]; do echo "Waiting for background jobs"; done
echo ping has finished

If you want Ctrl+Z to suspend a background process group, you'll need to propagate the signal from the shell.

Controlling signals finely is a bit of a stretch for a shell script, and shells other than ATT ksh tend to be a little buggy when you reach the corner cases, so consider a language that gives you more control such as Perl, Python or Ruby.

Bash Functions – Why Does Local fn=$(…) Mask the $? Status Code?

I thought this behavior was documented explicitly, because it's such a gotcha (especially when running bash scripts with -o errexit!), but it doesn't seem to. My copy of the manual says the following (about global, which behaves the same as local when within a function):

The return status is zero unless an invalid option is encountered, an attempt is made to define a function using ‘-f foo=bar’, an attempt is made to assign a value to a readonly variable, an attempt is made to assign a value to an array variable without using the compound assignment syntax [...], one of the names is not a valid shell variable name, an attempt is made to turn off readonly status for a readonly variable, an attempt is made to turn off array status for an array variable, or an attempt is made to display a non-existent function with -f.

So it would appear that local is not a keyword in the sense that one would expect in other programming languages: when an assignment-like parameter is provided to local, that does not qualify the initialization; rather, the local built-in command takes care of making the assignment happen, and the return code is that of local itself, not of the code possibly run in the initializer, and that return code will only be non-zero in the list of conditions listed above.

To perhaps answer the question in a more literal sense, as bishop mentioned in a comment, bash maintainer Chet Ramey was once asked if he would consider making local reflect failures happening during assignment, and responded, in essence, that assigning is not local's main mission:

Because that's not what local and its siblings [...] do. These builtins exist to assign and modify variable attributes. As an added feature, they support value assignment at the same time, but the important function is the attribute setting. They don't need to know how the value was computed. [...] Since the function is setting the attribute or value, the exit status should reflect whether or not that succeeded.

It may be worth noting that the same behavior can also be observed in the zsh shell.

The solution is to separate the two operations:

local variable
variable=$( somecommand )

exit_status=$?

Best Answer

Related Solutions

Shell Script – Prevent SIGINT from Interrupting Function Calls and Child Processes

Bash Functions – Why Does Local fn=$(…) Mask the $? Status Code?

Related Question