Bash – Running multiple bash functions in the background and waiting until they return

background-processbash

This is a simple script which runs nvidia-smi command on multiple hosts and saves its output to a common file. The goal here is to make it run asynchronously.

Is & at the end of the process_host() function call sufficient? Is my script correct?

#!/bin/bash

HOSTS=(host1 host2 host3)
OUTPUT_FILE=nvidia_smi.txt

rm $OUTPUT_FILE

process_host() {
    host=$1
    echo "Processing" $host
    output=`ssh ${host} nvidia-smi`
    echo ${host} >> $OUTPUT_FILE
    echo "$output" >> $OUTPUT_FILE
}

for host in ${HOSTS[@]}; do
    process_host ${host} &
done;

wait
cat $OUTPUT_FILE

Best Answer

Yes your script is correct, but your output may be garbled or out of sequence. It's better to have the function write its output to a specific file depending on the host name and then let the main script concatenate the result (and clean up).

Also, you should double quote your variables. Copy and paste the script into ShellCheck.

Maybe something like this:

#!/bin/bash

hosts=( host1 host2 host3 )
outfile="nvidia_smi.txt"

rm -f "$outfile"

function process_host {
    local host="$1"
    local hostout="$host.out"
    printf "Processing host '%s'\n" "$host"
    echo "$host" >"$hostout"
    ssh "$host" nvidia-smi >>"$hostout"
}

for host in "${hosts[@]}"; do
    process_host "$host" &
done

wait

for host in "${hosts[@]}"; do
    hostout="$host.out"
    cat "$hostout"
    rm -f "$hostout"
done >"$outfile"

cat "$outfile"

The last loop may be replaced by

cat "${hosts[@]/%/.out}" >"$outfile"
rm -f "${hosts[@]/%/.out}"

Having another look in 2021 at my old answer from 2016, I would probably write the code like this today:

#!/bin/sh

set -- host1 host2 host3
outfile=nvidia_smi.txt

for host do
    ssh "$host" nvidia-smi >"$outfile-$host" &
done

echo 'Commands submitted, now waiting...'
wait

for host do
    cat "$outfile-$host"
    rm -f "$outfile-$host"
done >"$outfile"

echo 'Done.'

This is shorter and since we don't really need any named arrays, we can run it all with /bin/sh in place of bash. The list of hosts to connect to is instead kept in the list of positional parameters. I've also removed informational output from within the loops, as well as the shell function (which really did not do very much).

A variant that uses proper temporary files instead of making up filenames that could potentially already be taken:

#!/bin/sh

set -- host1 host2 host3
outfile=nvidia_smi.txt

for host do
    tmpfile=$(mktemp)
    ssh "$host" nvidia-smi >"$tmpfile" &
    set -- "$@" "$tmpfile"
    shift
done

echo 'Commands submitted, now waiting...'
wait

for tmpfile do
    cat "$tmpfile"
    rm -f "$tmpfile"
done >"$outfile"

echo 'Done.'

This replaces the hostnames in the positional parameters by the pathnames of the temporary files that contain the output of that host.