Linux – Can GNU Parallel execute more parallel processes

gnu-parallellinux

Can I for example execute:

parallel -j 200 < list0

Where "list" has:

nice -n -20 parallel -j 100 < list2
nice -n -20 parallel -j 100 < list1

Would this be feasible/possible?

Best Answer

Not only is it possible; it is also recommended in some situations.

GNU Parallel takes around 10 ms to run a job. So if you have 8 cores and the jobs you run take less than 70 ms, then you will see GNU Parallel use 100% of a single core, and yet there will be idle time on other cores. Thus you will not use 100% of all cores.

The other situation where it is recommended is if you want to run more jobs than -j0 will do. Currently -j0 will run around 250 jobs in parallel unless you adjust some system limits. It makes perfect sense to run more than 250 jobs if the jobs are not limited by CPU and disk I/O. This is for example true if network latency is the limiting factor.

However, using 2 lists is not the recommended way to split up jobs. The recommended way is to use GNU Parallel to call GNU Parallel:

cat list0 | parallel -j20 --pipe parallel -j100

That will run 2000 jobs in parallel. To run more adjust -j. It is recommended that the outer (the 20) is at least the number of cores, so that there will be at least one GNU Parallel process on each core.

Using this technique you should have no problem starting 20000 jobs in parallel; when you get over 32000 processes things start acting up.

By first running:

echo 4194304 | sudo tee /proc/sys/kernel/pid_max

I was able to run:

seq 1000000 2000000000 |
  parallel -j16 --roundrobin --pipe parallel -j0 --pipe parallel -j0 sleep

which will start 1 million processes in parallel (it takes 300 G RAM on my system).

Related Solutions

Poor Man’s GNU Parallel implemented in ksh

If you want to parallelize on a machine with multiple cores, you can just use (GNU) xargs, e.g.:

echo seq_[0-9][0-9].gz | xargs -n 1 -P 16 ./crunching

Meaning: xargs starts up to 16 processes in parallel of ./crunching using 1 token from stdin for each process.

You can also use split in combination with xargs.

Or you can create a simple Makefile for Job execution and call make -f mymf -j $CORES (you need temporary files for this solution).

PS: The GNU parallel manual also includes some comparisons with other tools, including xargs and make, interestingly they write:

(Very early versions of GNU parallel were coincidently implemented using make -j).

GNU Parallel Limit Memory Usage

The short answer is:

ulimit -m 1000000
ulimit -v 1000000

which will limit each process to 1 GB RAM.

Limiting the memory the "right" way is in practice extremely complicated: Let us say you have 1 GB RAM. You start a process every 10 seconds and each process uses 1 MB more every second. So after 140 seconds you will have something like this:

10██▎                                                          
20██████▍                                                      
30██████████▌                                                  
40██████████████▋                                              
50██████████████████▊                                          
60██████████████████████▉                                      
70███████████████████████████                                  
80███████████████████████████████▏                             
90███████████████████████████████████▎                         
100██████████████████████████████████████▍                     
110██████████████████████████████████████████▌                 
120██████████████████████████████████████████████▋             
130██████████████████████████████████████████████████▊         
140██████████████████████████████████████████████████████▉

This sums up to 1050 MB RAM, so now you need kill something. What is the right job to kill? Is it 140 (assuming it ran amok)? Is it 10 (because it has run the least amount of time)?

In my experience jobs where memory is an issue are typically very predicable (e.g. transforming a bitmap) or very little predictable. For the very predictable ones you can do the computation beforehand and see how many jobs can be run.

For the unpredictable you ideally want the system to start few jobs that take up a lot of memory, and when they are done, you want the system to start more jobs that take up less memory. But you do not know beforehand which jobs will take a lot, which will take a little, and which ones run amok. Some jobs normal life cycle is to run with little memory for a long time and then balloon to a much bigger size later on. It is very hard to tell the difference between those jobs and jobs that run amok.

When someone points me to a well thought out way to do this in a way that will make sense for many applications, then GNU Parallel will probably be extended with that.

Best Answer

Related Solutions

Poor Man’s GNU Parallel implemented in ksh

GNU Parallel Limit Memory Usage

Related Question