Running GNU Parallel on 2 or more nodes with Slurm scheduler

gnu-parallelslurm

I am trying to distribute independent runs of a process using GNU Parallel on a HPC that uses Slurm workload manager. Briefly, here is the data analysis set up:

Script#1: myCommands

./myscript --input infile.txt --setting 1 --output out1
./myscript --input infile.txt --setting 2 --output out2
./myscript --input infile.txt --setting 3 --output out3
./myscript --input infile.txt --setting 4 --output out4

Script#2: run.sh

#SBATCH --time=00:02:00
#SBATCH --nodes=2
#SBATCH --cpus-per-task=2

cat myCommands | parallel -j 4

This works, however it only uses one node. The two cores on that nodes are split into 4 threads to make room for 4 jobs as requested by parallel. That is not desirable.

My searching indicates I will need a nodefile and a sshloginfile to accomplish this, but I see no examples online that work with Slurm, only with PBS system.

How can I make the script (1) use both nodes, and (2) not split cores into threads?

Best Answer

You can just do this with a round robin srun (something like):

jobs=({1..4})
nodes=($(scontrol show hostname $SLURM_NODELIST))
for ((n = 0; n < ${#jobs[@]}; n++)); do
  index=$(expr $n % ${#nodes[@]})
  srun --nodes=1 --ntasks=1 --nodelist=${nodes[$index]} \
       --exclusive ./myscript --input infile.txt \
       --setting $n --output out$n &
done
wait

I presume --cpus-per-task=2 will be given to srun. Let me know if you have any issues. I was messing around with parallel this morning, but I don't see how to fix this issue directly. Additionally, I found that if you scancel a job which contains GNU parallel jobs the running processes don't die unless you use srun.

Related Question