When specifying the option --jobs
to GNU parallel
, what exactly does it mean?
I execute:
parallel --jobs 10 ./program ::: {1..100}
where program
is an intensive task, and the jobs are completely independent of each other. {1..100}
represents symbolic inputs to each task. When I inspect the processes running on the PC, I find that many times there are less than 10 jobs running simultaneously.
So what exactly is --jobs
specifying?
Best Answer
As per the man page,
--jobs
is the maximum number of jobs that will run in parallel on each machine (emphasis mine):It does not mean that it will always equal that. The first and foremost requirement for parallel computing is that jobs can be run independently and the final output can be combined such that it will produce the same output if the jobs are run sequentially. If this is not possible, the task cannot be done in parallel.
Also, from the GNU parallel man page:
Now, if the file has only 2 lines, but you pass
--jobs 10
,parallel
cannot run 10 jobs for 2 lines, since the smallest input that it takes is a line. So, you will only see 2 jobs.This is not just the case with GNU parallel, but pretty much any parallel computation engine.