Bash – Why does the parallel command print “Starting” and ”Finished“ at the same time

bashcommand linegnu-parallel

ls *.txt | parallel 'echo Starting on file {}; mkdir {.}; cd {.}; longCMD3 ../{} > /dev/null; echo Finished file {}'

This one liner partially works, except longCMD3 takes about 3 minutes, but the first and second echo commands are printed almost at the same time.
I tried putting in

wait

before the final echo, but that made no difference.

How can I ensure that the final echo is only printed once longCMD3 is complete?

Here's an example

Assume I only have 4 cores:

ls
foo1.txt foo2.txt foo3.txt foo4.txt foo5.txt foo6.txt 

What I expected:

Starting on file foo1.txt
Starting on file foo2.txt
Starting on file foo3.txt
Starting on file foo4.txt

then at least 2 minutes should pass for longCMD3 to finish on one of the files

Finished file foo1.txt
Starting on file foo5.txt

But what I get is:

Starting on file foo1.txt
Finished file foo1.txt
Starting on file foo2.txt
Finished file foo2.txt
Starting on file foo3.txt
Finished file foo3.txt
Starting on file foo4.txt
Finished file foo4.txt

This continues for all 6 files. And the Start and Finished statements are printed simultaneously for each file. But a few minutes are expended between each file.

Best Answer

For each file, the commands echo Starting on file foo.txt, mkdir foo, cd foo, longCMD3 ../foo.txt > /dev/null and echo Finished file foo.txt run sequentially, i.e. each command starts after the previous one has finished.

The commands for different files are interspersed. By default, the parallel command runs as many jobs in parallel as you have cores.

However the output of the commands is not interspersed by default. This is why you don't see a bunch of “Starting” lines and then later the corresponding “Finished” lines. Parallel groups the output of each job together. It buffers the output until the job is finished. See the description of the --group option in the manual. Grouping doesn't make sense in your case, so turn it off with the --ungroup (-u) option, or switch to line grouping with --line-buffer.

Some other corrections:

  • Parsing ls is not reliable. Pass the file names to parallel directly.
  • If mkdir fails, you shouldn't proceed. If any command fails, you should arrange for the job to fail. An easy way to do that is to start the job script with set -e.
parallel --line-buffer 'set -e; echo Starting on file {}; mkdir {.}; cd {.}; longCMD3 ../{} > /dev/null; echo Finished file {}' ::: *.txt
Related Question