Efficient way to use all cores in bash or zsh script

bashmulti-coreshell-scriptzsh

If I want to process large number of files with command "do_something" which only can use one core, what's the best way to use all available cores assuming each file can be processed independently?

At this moment I do something like this:

#!/bin/zsh
TASK_LIMIT=8
TASKS=0
for i in *(.)
{
  do_something "$i"&
  TASKS=$(($TASKS+1))
  if [[ $TASKS -ge $TASK_LIMIT ]]; then
    wait; TASKS=0; fi
}
wait

Obviously, this is not efficient because after reaching $TASK_LIMIT it waits when all "do_something" finish. For example in my real script I make use of about 500% of my 8-core CPU instead of >700%.

Running without $TASK_LIMIT is not an option because "do_something" may consume lots of memory.

Ideally, the script should try to keep number of parallel tasks at $TASK_LIMIT: for example if task 1 of 8 finished and there is at least one more file to process, the script should run next "do_something" instead of waiting for remaining 7 tasks to finish. Is there a way to achieve this in zsh or bash?

Best Answer

I strongly suggest having a look at GNU parallel. It does exactly what you want and doesn't depend on any particular shell.

Related Question