I want to run some simulations using a Python tool that I had made. The catch is that I would have to call it multiple times with different parameters/arguments and everything.
For now, I am using multiple for
loops for the task, like:
for simSeed in 1 2 3 4 5
do
for launchPower in 17.76 20.01 21.510 23.76
do
python sim -a $simSeed -p $launchPower
done
done
In order for the simulations to run simultaneously, I append a &
at the end of the line where I call the simulator.
python sim -a $simSeed -p $launchPower &
Using this method I am able to run multiple such seeds. However, since my computer has limited memory, I want to re-write the above script so that it launches the inner for
loop parallelly and the outer for
loop sequentially.
As an example, for simSeed = 1
, I want 5 different processes to run with launchPower
equal to 17.76 20.01 21.510 23.76
. As soon as this part is complete, I want the script to run for simSeed = 2
and again 5 different parallel processes with launchPower
equal to 17.76 20.01 21.510 23.76
.
How can I achieve this task?
TLDR:
I want the outer loop to run sequentially and inner loop to run parallelly such that when the last parallel process of the inner loop finishes, the outer loop moves to the next iteration.
Best Answer
GNU parallel has several options to limit resource usage when starting jobs in parallel.
The basic usage for two nested loops would be
If you want to launch at most 5 jobs at the same time, e.g., you could say
Alternatively, you can use the
--memfree
option to start new jobs only when enough memory is free, e.g. at least 256 MByteNote that the last option will kill the most recently started job if the memory falls below 50% of the "reserve" value stated (but it will be re-qeued for catch-up automatically).