ls *.txt | parallel 'echo Starting on file {}; mkdir {.}; cd {.}; longCMD3 ../{} > /dev/null; echo Finished file {}'
This one liner partially works, except longCMD3 takes about 3 minutes, but the first and second echo commands are printed almost at the same time.
I tried putting in
wait
before the final echo, but that made no difference.
How can I ensure that the final echo is only printed once longCMD3 is complete?
Here's an example
Assume I only have 4 cores:
ls
foo1.txt foo2.txt foo3.txt foo4.txt foo5.txt foo6.txt
What I expected:
Starting on file foo1.txt
Starting on file foo2.txt
Starting on file foo3.txt
Starting on file foo4.txt
then at least 2 minutes should pass for longCMD3 to finish on one of the files
Finished file foo1.txt
Starting on file foo5.txt
But what I get is:
Starting on file foo1.txt
Finished file foo1.txt
Starting on file foo2.txt
Finished file foo2.txt
Starting on file foo3.txt
Finished file foo3.txt
Starting on file foo4.txt
Finished file foo4.txt
This continues for all 6 files. And the Start and Finished statements are printed simultaneously for each file. But a few minutes are expended between each file.
Best Answer
For each file, the commands
echo Starting on file foo.txt
,mkdir foo
,cd foo
,longCMD3 ../foo.txt > /dev/null
andecho Finished file foo.txt
run sequentially, i.e. each command starts after the previous one has finished.The commands for different files are interspersed. By default, the parallel command runs as many jobs in parallel as you have cores.
However the output of the commands is not interspersed by default. This is why you don't see a bunch of “Starting” lines and then later the corresponding “Finished” lines. Parallel groups the output of each job together. It buffers the output until the job is finished. See the description of the
--group
option in the manual. Grouping doesn't make sense in your case, so turn it off with the--ungroup
(-u
) option, or switch to line grouping with--line-buffer
.Some other corrections:
parallel
directly.mkdir
fails, you shouldn't proceed. If any command fails, you should arrange for the job to fail. An easy way to do that is to start the job script withset -e
.