Bash – Putting Subshell in Background vs Putting Command in Background

bashforkshell-scriptsubshell

I have two bash scripts that try to check hosts that are up:

Script 1:

#!/bin/bash 
for ip in {1..254}; do
    ping -c 1 192.168.1.$ip | grep "bytes from" | cut -d" " -f 4 | cut -d ":" -f 1 &
done

Script 2:

#!/bin/bash 
for ip in {1..254}; do
    host=192.168.1.$ip
    (ping -c 1 $host > /dev/null 
    if [ "$?" = 0 ]
    then 
        echo $host
    fi) &
done

As I am checking a large range, I would like to process each ping command in parallel. However, my second script seems to not retry failed fork attempts due to resource limits. This results in the second script having inconsistent results while my first script gives constant results despite both failing to fork at times. Can someone explain this to me? Also is there anyway to retry failed forks?

Best Answer

There is already an answer which gives an improved code snippet to the task the original poster questions was related to, while it might not yet have more directly responded to the question.

The question is about differences of

A) Backgrounding a "command" directly, vs
B) Putting a subshell into the background (i.e with a similar task)

Lets check about those differences running 2 tests

# A) Backgrounding a command directly
sleep 2 & ps

outputs

[1] 4228
  PID TTY          TIME CMD
 4216 pts/8    00:00:00 sh
 4228 pts/8    00:00:00 sleep

while

# A) backgrounding a subhell (with similar tas)
( sleep 2; ) & ps

outputs something like:

[1] 3252
  PID TTY          TIME CMD
 3216 pts/8    00:00:00 sh
 3252 pts/8    00:00:00 sh
 3253 pts/8    00:00:00 ps
 3254 pts/8    00:00:00 sleep

** Test results:**

In this test (which run only a sleep 2) the subshell version indeed differs, as it would use 2 child processes (i.e. two fork()/exec operations and PID) and hence more than the direct backgrounding of the command.

In the script 1 of the question however the command was not a single sleep 2s but instead it was a pipe of 4 commands, which if we test in an additional case

C) Backgrounding a pipe with 4 commands

# C) Backgrounding a pipe with 4 commands
sleep 2s | sleep 2s | sleep 2s | sleep 2s & ps

yields this

[2] 3265
  PID TTY          TIME CMD
 3216 pts/8    00:00:00 bash
 3262 pts/8    00:00:00 sleep
 3263 pts/8    00:00:00 sleep
 3264 pts/8    00:00:00 sleep
 3265 pts/8    00:00:00 sleep
 3266 pts/8    00:00:00 ps

and shows that indeed the script 1 would be a much higher strain in terms of PIDs and fork()s.

As a rough estimate the script one would have used about 254 * 4 ~= 1000 PIDs and hence even more than the script 2 with 254 * 2 ~= 500 PIDs. Any problem occurring because of PIDs resouce depletion seems yet unlikely since at most Linux boxes

$ cat /proc/sys/kernel/pid_max
32768

gives you 32x times the PIDs needed even for case script 1 and the processes/programs involved (i.e. sed , ping, etc) also seem unlikely to cause the inconstant results.

As mentioned by user @derobert the real issue behind the scripts failing was that the missing of the wait command, which means that after backgrounding the commands in the loop the end of the script and hence the shell caused all the child processes to be terminated.

Related Solutions

Bash Subshell – Do Parentheses Really Put the Command in a Subshell?

A subshell starts out as an almost identical copy of the original shell process. Under the hood, the shell calls the fork system call¹, which creates a new process whose code and memory are copies². When the subshell is created, there are very few differences between it and its parent. In particular, they have the same variables. Even the $$ special variable keeps the same value in subshells: it's the original shell's process ID. Similarly $PPID is the PID of the parent of the original shell.

A few shells change a few variables in the subshell. Bash sets BASHPID to the PID of the shell process, which changes in subshells. Bash, zsh and mksh arrange for $RANDOM to yield different values in the parent and in the subshell. But apart from built-in special cases like these, all variables have the same value in the subshell as in the original shell, the same export status, the same read-only status, etc. All function definitions, alias definitions, shell options and other settings are inherited as well.

A subshell created by (…) has the same file descriptors as its creator. Some other means of creating subshells modify some file descriptors before executing user code; for example, the left-hand side of a pipe runs in a subshell³ with standard output connected to the pipe. The subshell also starts out with the same current directory, the same signal mask, etc. One of the few exceptions is that subshells do not inherit custom traps: ignored signals (trap '' SIGNAL) remain ignored in the subshell, but other traps (trap CODE SIGNAL) are reset to the default action⁴.

A subshell is thus different from executing a script. A script is a separate program. This separate program might coincidentally be also a script which is executed by the same interpreter as the parent, but this coincidence doesn't give the separate program any special visibility on internal data of the parent. Non-exported variables are internal data, so when the interpreter for the child shell script is executed, it doesn't see these variables. Exported variables, i.e. environment variables, are transmitted to executed programs.

Thus:

x=1
(echo $x)

prints 1 because the subshell is a replication of the shell that spawned it.

x=1
sh -c 'echo $x'

happens to run a shell as a child process of a shell, but the x on the second line has no more connection with the x on the second line than in

x=1
perl -le 'print $x'

x=1
python -c 'print x'

¹ _{An exception is the ksh93 shell where the forking is optimised out and most of its side effects are emulated.}
² _{Semantically, they're copies. From an implementation perspective, there's a lot of sharing going on.}
³ _{For the right-hand side, it depends on the shell.}
⁴ _{If you test this out, note that things like $(trap) may report the traps of the original shell. Note also that many shells have bugs in corner cases involving traps. For example ninjalj notes that as of bash 4.3, bash -x -c 'trap "echo ERR at \$BASH_SUBSHELL \$BASHPID" ERR; set -E; false; echo one subshell; (false); echo two subshells; ( (false) )' runs the ERR trap from the nested subshell in the “two subshells” case, but not the ERR trap from the intermediate subshell — set -E option should propagate the ERR trap to all subshells but the intermediate subshell is optimized away and so isn't there to run its ERR trap.}

Bash – Assign Subshell background process pid to variable

bash shouldn't print the job status when non-interactive.

If that's indeed for an interactive bash, you can do:

{ pid=$(sleep 20 >&3 3>&- & echo "$!"); } 3>&1

We want sleep's stdout to go to where it was before, not the pipe that feeds the $pid variable. So we save the outer stdout in the file descriptor 3 (3>&1) and restore it for sleep inside the command substitution. So pid=$(...) returns as soon as echo terminates because there's nothing left with an open file descriptor to the pipe that feeds $pid.

However note that because it's started from a subshell (here in a command substitution), that sleep will not run in a separate process group. So it's not the same as running sleep 20 & with regards to I/O to the terminal for instance.

Maybe better would be to use a shell that supports spawning disowned background jobs like zsh where you can do:

sleep 20 &! pid=$!

With bash, you can approximate it with:

{ sleep 20 2>&3 3>&- & } 3>&2 2> /dev/null; pid=$!; disown "$pid"

bash outputs the [1] 21578 to stderr. So again, we save stderr before redirecting to /dev/null, and restore it for the sleep command. That way, the [1] 21578 goes to /dev/null but sleep's stderr goes as usual.

If you're going to redirect everything to /dev/null anyway, you can simply do:

{ apt-get update & } > /dev/null 2>&1; pid=$!; disown "$pid"

To redirect only stdout:

{ apt-get-update 2>&3 3>&- & } 3>&2 > /dev/null 2>&1; pid=$!; disown "$pid"

Best Answer

Related Solutions

Bash Subshell – Do Parentheses Really Put the Command in a Subshell?

Bash – Assign Subshell background process pid to variable

Related Question