Shell Scripting Parallelism – How to Execute a Program on Multiple Files in Parallel

parallelismscriptingshell-script

I have a small script that loops through all files of a folder and executes a (usually long lasting) command. Basically it's

for file in ./folder/*;
do
    ./bin/myProgram $file > ./done/$file
done

(Please Ignore syntax errors, it's just pseudo code).

I now wanted to run this script twice at the same time. Obviously, the execution is unnecessary if ./done/$file exists. So I changed the script to

for file in ./folder/*;
do
    [ -f ./done/$file ] || ./bin/myProgram $file >./done/$file
done

So basically the question is:
Is it possible that both scripts (or in general more than one script) actually are at the same point and check for the existance of the done file which fails and the command runs twice?

it would be just perfect, but I highly doubt it. This would be too easy 😀
If it can happen that they process the same file, is it possible to somehow "synchronize" the scripts?

Best Answer

This is possible and does occur in reality. Use a lock file to avoid this situation. An example, from said page:

if mkdir /var/lock/mylock; then
    echo "Locking succeeded" >&2
else
    echo "Lock failed - exit" >&2
    exit 1
fi

# ... program code ...

rmdir /var/lock/mylock
Related Question