Does “parallel –jobs 10” mean that exactly 10 jobs will run

gnu-parallel

When specifying the option --jobs to GNU parallel, what exactly does it mean?

I execute:

parallel --jobs 10 ./program ::: {1..100}

where program is an intensive task, and the jobs are completely independent of each other. {1..100} represents symbolic inputs to each task. When I inspect the processes running on the PC, I find that many times there are less than 10 jobs running simultaneously.

So what exactly is --jobs specifying?

Best Answer

As per the man page, --jobs is the maximum number of jobs that will run in parallel on each machine (emphasis mine):

--jobs N

Number of jobslots on each machine. Run up to N jobs in parallel. 0 means as many as possible.

It does not mean that it will always equal that. The first and foremost requirement for parallel computing is that jobs can be run independently and the final output can be combined such that it will produce the same output if the jobs are run sequentially. If this is not possible, the task cannot be done in parallel.

Also, from the GNU parallel man page:

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input.

Now, if the file has only 2 lines, but you pass --jobs 10, parallel cannot run 10 jobs for 2 lines, since the smallest input that it takes is a line. So, you will only see 2 jobs.

This is not just the case with GNU parallel, but pretty much any parallel computation engine.

Related Question