Bash – Run asynchronous tasks and retrieve their exit code and output in bash

bashcommand linelinuxprocess-substitutionshell-script

I have to run a bunch of bash commands asynchronously and as soon as one finishes, I need to perform actions according to its exit code and output. Note that I can't predict how for long any of these tasks will run in my real use case.

To solve this problem, I ended up with the following algorithm:

For each task to be run:
    Run the task asynchronously;
    Append the task to the list of running tasks.
End For.

While there still are tasks in the list of running tasks:
    For each task in the list of running tasks:
        If the task has ended:
            Retrieve the task's exit code and output;
            Remove the task from the list of running tasks.
        End If.
    End For
End While.

This gives me the following bash script:

  1 #!/bin/bash
  2 
  3 # bg.sh
  4 
  5 # Executing commands asynchronously, retrieving their exit codes and outputs upon completion.
  6 
  7 asynch_cmds=
  8 
  9 echo -e "Asynchronous commands:\nPID    FD"
 10 
 11 for i in {1..10}; do
 12         exec {fd}< <(sleep $(( i * 2 )) && echo $RANDOM && exit $i) # Dummy asynchronous task, standard output's stream is redirected to the current shell
 13         asynch_cmds+="$!:$fd " # Append the task's PID and FD to the list of running tasks
 14         
 15         echo "$!        $fd"
 16 done    
 17 
 18 echo -e "\nExit codes and outputs:\nPID       FD      EXIT    OUTPUT"
 19 
 20 while [[ ${#asynch_cmds} -gt 0 ]]; do # While the list of running tasks isn't empty
 21         
 22         for asynch_cmd in $asynch_cmds; do # For each to in thhe list
 23                 
 24                 pid=${asynch_cmd%:*} # Task's PID
 25                 fd=${asynch_cmd#*:} # Task's FD
 26                 
 27                 if ! kill -0 $pid 2>/dev/null; then # If the task ended
 28                         
 29                         wait $pid # Retrieving the task's exit code
 30                         echo -n "$pid   $fd     $?      "
 31                         
 32                         echo "$(cat <&$fd)" # Retrieving the task's output
 33                         
 34                         asynch_cmds=${asynch_cmds/$asynch_cmd /} # Removing the task from the list
 35                 fi
 36         done
 37 done

The output tells me that wait fails trying to retrieve the exit code of each tasks, except the last one to be run:

Asynchronous commands:
PID     FD
4348    10
4349    11
4351    12
4353    13
4355    14
4357    15
4359    16
4361    17
4363    18
4365    19

Exit codes and outputs:
PID     FD  EXIT OUTPUT
./bg.sh: line 29: wait: pid 4348 is not a child of this shell
4348    10  127  16010
./bg.sh: line 29: wait: pid 4349 is not a child of this shell
4349    11  127  8341
./bg.sh: line 29: wait: pid 4351 is not a child of this shell
4351    12  127  13814
./bg.sh: line 29: wait: pid 4353 is not a child of this shell
4353    13  127  3775
./bg.sh: line 29: wait: pid 4355 is not a child of this shell
4355    14  127  2309
./bg.sh: line 29: wait: pid 4357 is not a child of this shell
4357    15  127  32203
./bg.sh: line 29: wait: pid 4359 is not a child of this shell
4359    16  127  5907
./bg.sh: line 29: wait: pid 4361 is not a child of this shell
4361    17  127  31849
./bg.sh: line 29: wait: pid 4363 is not a child of this shell
4363    18  127  28920
4365    19  10   28810

The output of the commands is flawlessly retrieved, but I don't understand where this is not a child of this shell error comes from. I must be doing something wrong, as wait is able to get the exit code of the last command to be run asynchronously.

Does anyone know where this error comes from? Is my solution to this problem flawed, or am I misunderstanding the behavior of bash? I'm having a hard time understand the behavior of wait.

P.S: I posted this question on Super User, but on second thought, it might be better suited to the Unix & Linux Stack Exchange.

Best Answer

This is a bug/limitation; bash only allows to wait for the last process substitution, no matter if you save the value of $! into another variable.

Simpler testcase:

$ cat script
exec 7< <(sleep .2); pid7=$!
exec 8< <(sleep .2); pid8=$!
echo $pid7 $pid8
echo $(pgrep -P $$)
wait $pid7
wait $pid8

$ bash script
6030 6031
6030 6031
/tmp/sho: line 9: wait: pid 6030 is not a child of this shell

Despite pgrep -P actually finding this as a child of the shell, and strace showing that bash is actually reaping it.

But anyways, $! being also set to the PID of the last process substitution is an undocumented feature (which iirc didn't use to work in older versions), and is subject to some gotchas.


This happens because bash only keeps track of the last process substitution in the last_procsub_child variable. This is where wait will look for the pid:

-- jobs.c --
/* Return the pipeline that PID belongs to.  Note that the pipeline
   doesn't have to belong to a job.  Must be called with SIGCHLD blocked.
   If JOBP is non-null, return the index of the job containing PID.  */
static PROCESS *
find_pipeline (pid, alive_only, jobp)
     pid_t pid;
     int alive_only;
     int *jobp;         /* index into jobs list or NO_JOB */
{
     ...
  /* Now look in the last process substitution pipeline, since that sets $! */
  if (last_procsub_child)
    {

but that will be discarded when a new proc subst is created:

-- subst.c --
static char *
process_substitute (string, open_for_read_in_child)
     char *string;
     int open_for_read_in_child;
{
   ...
      if (last_procsub_child)
        discard_last_procsub_child ();