Running GNU Parallel on 2 or more nodes with Slurm scheduler

gnu-parallelslurm

I am trying to distribute independent runs of a process using GNU Parallel on a HPC that uses Slurm workload manager. Briefly, here is the data analysis set up:

Script#1: myCommands

./myscript --input infile.txt --setting 1 --output out1
./myscript --input infile.txt --setting 2 --output out2
./myscript --input infile.txt --setting 3 --output out3
./myscript --input infile.txt --setting 4 --output out4

Script#2: run.sh

#SBATCH --time=00:02:00
#SBATCH --nodes=2
#SBATCH --cpus-per-task=2

cat myCommands | parallel -j 4

This works, however it only uses one node. The two cores on that nodes are split into 4 threads to make room for 4 jobs as requested by parallel. That is not desirable.

My searching indicates I will need a nodefile and a sshloginfile to accomplish this, but I see no examples online that work with Slurm, only with PBS system.

How can I make the script (1) use both nodes, and (2) not split cores into threads?

Best Answer

You can just do this with a round robin srun (something like):

jobs=({1..4})
nodes=($(scontrol show hostname $SLURM_NODELIST))
for ((n = 0; n < ${#jobs[@]}; n++)); do
  index=$(expr $n % ${#nodes[@]})
  srun --nodes=1 --ntasks=1 --nodelist=${nodes[$index]} \
       --exclusive ./myscript --input infile.txt \
       --setting $n --output out$n &
done
wait

I presume --cpus-per-task=2 will be given to srun. Let me know if you have any issues. I was messing around with parallel this morning, but I don't see how to fix this issue directly. Additionally, I found that if you scancel a job which contains GNU parallel jobs the running processes don't die unless you use srun.

Related Solutions

Linux – Can GNU Parallel execute more parallel processes

Not only is it possible; it is also recommended in some situations.

GNU Parallel takes around 10 ms to run a job. So if you have 8 cores and the jobs you run take less than 70 ms, then you will see GNU Parallel use 100% of a single core, and yet there will be idle time on other cores. Thus you will not use 100% of all cores.

The other situation where it is recommended is if you want to run more jobs than -j0 will do. Currently -j0 will run around 250 jobs in parallel unless you adjust some system limits. It makes perfect sense to run more than 250 jobs if the jobs are not limited by CPU and disk I/O. This is for example true if network latency is the limiting factor.

However, using 2 lists is not the recommended way to split up jobs. The recommended way is to use GNU Parallel to call GNU Parallel:

cat list0 | parallel -j20 --pipe parallel -j100

That will run 2000 jobs in parallel. To run more adjust -j. It is recommended that the outer (the 20) is at least the number of cores, so that there will be at least one GNU Parallel process on each core.

Using this technique you should have no problem starting 20000 jobs in parallel; when you get over 32000 processes things start acting up.

By first running:

echo 4194304 | sudo tee /proc/sys/kernel/pid_max

I was able to run:

seq 1000000 2000000000 |
  parallel -j16 --roundrobin --pipe parallel -j0 --pipe parallel -j0 sleep

which will start 1 million processes in parallel (it takes 300 G RAM on my system).

Best Answer

Related Solutions

Linux – Can GNU Parallel execute more parallel processes

Related Question