I am trying to distribute independent runs of a process using GNU Parallel on a HPC that uses Slurm workload manager. Briefly, here is the data analysis set up:
Script#1: myCommands
./myscript --input infile.txt --setting 1 --output out1
./myscript --input infile.txt --setting 2 --output out2
./myscript --input infile.txt --setting 3 --output out3
./myscript --input infile.txt --setting 4 --output out4
Script#2: run.sh
#SBATCH --time=00:02:00
#SBATCH --nodes=2
#SBATCH --cpus-per-task=2
cat myCommands | parallel -j 4
This works, however it only uses one node. The two cores on that nodes are split into 4 threads to make room for 4 jobs as requested by parallel. That is not desirable.
My searching indicates I will need a nodefile
and a sshloginfile
to accomplish this, but I see no examples online that work with Slurm
, only with PBS
system.
How can I make the script (1) use both nodes, and (2) not split cores into threads?
Best Answer
You can just do this with a round robin
srun
(something like):I presume
--cpus-per-task=2
will be given tosrun
. Let me know if you have any issues. I was messing around with parallel this morning, but I don't see how to fix this issue directly. Additionally, I found that if youscancel
a job which contains GNU parallel jobs the running processes don't die unless you usesrun
.