Bash – GNU parallel with for loop

bashgnu-parallelimagemagick

I've found answers close to this but fail to understand how to use them in my case (I'm rather new to Bash)… so, I'm trying to process a folder containing a large image sequence (100k+ files) with Imagemagick and would like to use GNU Parallel to speed things up.

This is the code I use (processing 100 frames at a time to avoid running out of ram):

calcmethod1=mean;
allframes=(*.png)
cd out1

for (( i=0; i < "${#allframes[@]}" ; i+=100 )); do 
    convert "${allframes[@]:i:100}" -evaluate-sequence "$calcmethod1" \
        -channel RGB -normalize ../out2/"${allframes[i]}"
done

how would I 'parallelize' this? Most solutions I've found work with not using a loop but piping – but doing this I've run into the problem that my script would break because of my arguments list getting too long…

I guess what I would want to do is to have parallel splitting the load like handing the first 100 frames to core 1, frames 100-199 to core 2 etc.?

Best Answer

Order

Your sample program did not seem to care about the order of the *.png for the allframes array that you were constructing, but your comments led me to believe that order would matter.

I guess what I would want to do is to have parallel splitting the load like handing the first 100 frames to core 1, frames 100-199 to core 2 etc.?

Bash

Therefore I'd start with a modification to your script like so, changing the construction of the allframes array so that the files are stored in numeric order.

allframes=($(printf "%s\n" *.png | sort -V | tr '\n' ' '))

This can be simplified further to this using sort -zV:

allframes=($(printf "%s\0" *.png | sort -zV | tr '\0' ' '))

This has the effect on constructing your convert ... commands so that they look like this now:

$ convert "0.png 1.png 2.png 3.png 4.png 5.png 6.png 7.png 8.png 9.png \
          10.png 11.png 12.png 13.png 14.png 15.png 16.png 17.png 18.png \
          19.png 20.png 21.png 22.png 23.png 24.png 25.png 26.png 27.png \
          28.png 29.png 30.png 31.png 32.png 33.png 34.png 35.png 36.png \
          37.png 38.png 39.png 40.png 41.png 42.png 43.png 44.png 45.png \
          46.png 47.png 48.png 49.png 50.png 51.png 52.png 53.png 54.png \
          55.png 56.png 57.png 58.png 59.png 60.png 61.png 62.png 63.png \
          64.png 65.png 66.png 67.png 68.png 69.png 70.png 71.png 72.png \
          73.png 74.png 75.png 76.png 77.png 78.png 79.png 80.png 81.png \
          82.png 83.png 84.png 85.png 86.png 87.png 88.png 89.png 90.png \
          91.png 92.png 93.png 94.png 95.png 96.png 97.png 98.png 99.png" \
          -evaluate-sequence "mean" -channel RGB -normalize ../out2/0.png

Parallels

Building off of eschwartz's example I put together a parallel example as follows:

$ printf '%s\n' *.png | sort -V | parallel -n100 --dryrun convert {} \
   -evaluate-sequence 'mean' -channel RGB -normalize ../out2/{1}

again, more simply using sort -zV:

$ printf '%s\0' *.png | sort -zV | parallel -0 -n100 --dryrun "convert {} \
   -evaluate-sequence 'mean' -channel RGB -normalize ../out2/{1}

NOTE: The above has an echo "..." as the parallel action to start. Doing it this way helps to visualize what's happening:

$ convert 0.png 1.png 2.png 3.png 4.png 5.png 6.png 7.png 8.png 9.png 10.png \
         11.png 12.png 13.png 14.png 15.png 16.png 17.png 18.png 19.png \
         20.png 21.png 22.png 23.png 24.png 25.png 26.png 27.png 28.png \
         29.png 30.png 31.png 32.png 33.png 34.png 35.png 36.png 37.png \
         38.png 39.png 40.png 41.png 42.png 43.png 44.png 45.png 46.png \
         47.png 48.png 49.png 50.png 51.png 52.png 53.png 54.png 55.png \ 
         56.png 57.png 58.png 59.png 60.png 61.png 62.png 63.png 64.png \ 
         65.png 66.png 67.png 68.png 69.png 70.png 71.png 72.png 73.png \ 
         74.png 75.png 76.png 77.png 78.png 79.png 80.png 81.png 82.png \
         83.png 84.png 85.png 86.png 87.png 88.png 89.png 90.png 91.png \
         92.png 93.png 94.png 95.png 96.png 97.png 98.png 99.png \
         -evaluate-sequence mean -channel RGB -normalize ../out2/0.png

If you're satisfied with this output, simply remove the --dryrun switch to parallel, and rerun it.

$ printf '%s\0' *.png | sort -zV | parallel -0 -n100 convert {} \ 
    -evaluate-sequence 'mean' -channel RGB -normalize

References

Related Question