GNU Parallel Limit Memory Usage

gnu-parallelmemorynice

Is it possible to limit the memory usage of all processes started by GNU parallel? I realize there are ways to limit the number of jobs, but in cases where it isn't easy to predict the memory usage ahead of time it can be a difficult to tune this parameter.

In my particular case I'm running programs on a HPC where there are hard limits on process memory. E.g. if there's 72GB of ram available on a node, the batch system will kill jobs that exceed 70GB. I'm also unable to spawn jobs directly to the swap and hold them there.

The GNU parallel package comes with niceload, which seems to allow for the current memory usage to be checked before a process runs. However I'm not sure how to use this.

Best Answer

The short answer is:

ulimit -m 1000000
ulimit -v 1000000

which will limit each process to 1 GB RAM.

Limiting the memory the "right" way is in practice extremely complicated: Let us say you have 1 GB RAM. You start a process every 10 seconds and each process uses 1 MB more every second. So after 140 seconds you will have something like this:

10██▎                                                          
20██████▍                                                      
30██████████▌                                                  
40██████████████▋                                              
50██████████████████▊                                          
60██████████████████████▉                                      
70███████████████████████████                                  
80███████████████████████████████▏                             
90███████████████████████████████████▎                         
100██████████████████████████████████████▍                     
110██████████████████████████████████████████▌                 
120██████████████████████████████████████████████▋             
130██████████████████████████████████████████████████▊         
140██████████████████████████████████████████████████████▉

This sums up to 1050 MB RAM, so now you need kill something. What is the right job to kill? Is it 140 (assuming it ran amok)? Is it 10 (because it has run the least amount of time)?

In my experience jobs where memory is an issue are typically very predicable (e.g. transforming a bitmap) or very little predictable. For the very predictable ones you can do the computation beforehand and see how many jobs can be run.

For the unpredictable you ideally want the system to start few jobs that take up a lot of memory, and when they are done, you want the system to start more jobs that take up less memory. But you do not know beforehand which jobs will take a lot, which will take a little, and which ones run amok. Some jobs normal life cycle is to run with little memory for a long time and then balloon to a much bigger size later on. It is very hard to tell the difference between those jobs and jobs that run amok.

When someone points me to a well thought out way to do this in a way that will make sense for many applications, then GNU Parallel will probably be extended with that.

Related Solutions

Linux – How to Limit Memory Usage for a Single Process

There's some problems with ulimit. Here's a useful read on the topic: Limiting time and memory consumption of a program in Linux, which lead to the timeout tool, which lets you cage a process (and its forks) by time or memory consumption.

The timeout tool requires Perl 5+ and the /proc filesystem mounted. After that you copy the tool to e.g. /usr/local/bin like so:

curl https://raw.githubusercontent.com/pshved/timeout/master/timeout | \
  sudo tee /usr/local/bin/timeout && sudo chmod 755 /usr/local/bin/timeout

After that, you can 'cage' your process by memory consumption as in your question like so:

timeout -m 500 pdftoppm Sample.pdf

Alternatively you could use -t <seconds> and -x <hertz> to respectively limit the process by time or CPU constraints.

The way this tool works is by checking multiple times per second if the spawned process has not oversubscribed its set boundaries. This means there actually is a small window where a process could potentially be oversubscribing before timeout notices and kills the process.

A more correct approach would hence likely involve cgroups, but that is much more involved to set up, even if you'd use Docker or runC, which among things, offer a more user-friendly abstraction around cgroups.

Running GNU Parallel on 2 or more nodes with Slurm scheduler

You can just do this with a round robin srun (something like):

jobs=({1..4})
nodes=($(scontrol show hostname $SLURM_NODELIST))
for ((n = 0; n < ${#jobs[@]}; n++)); do
  index=$(expr $n % ${#nodes[@]})
  srun --nodes=1 --ntasks=1 --nodelist=${nodes[$index]} \
       --exclusive ./myscript --input infile.txt \
       --setting $n --output out$n &
done
wait

I presume --cpus-per-task=2 will be given to srun. Let me know if you have any issues. I was messing around with parallel this morning, but I don't see how to fix this issue directly. Additionally, I found that if you scancel a job which contains GNU parallel jobs the running processes don't die unless you use srun.

Best Answer

Related Solutions

Linux – How to Limit Memory Usage for a Single Process

Running GNU Parallel on 2 or more nodes with Slurm scheduler

Related Question