Shell – How to create a bounded queue for shell tasks

command linefindparallelismshellsort

I have 1000 gzipped files which I want to sort.

Doing this sequentially, the procedure looks pretty straightforward:

find . -name *.gz -exec zcat {} | sort > {}.txt \;

Not sure that the code above works (please correct me if I did a mistake somewhere), but I hope you understand the idea.

Anyway, I'd like to parallelize ungzip/sort jobs in order to make the whole thing faster. Also, I don't want to see all 1000 processes running simultaneously. It would be great to have some bounded job queue (like BlockingQueue in Java or BlockingCollection in .NET) with configurable capacity. In this case, only, say, 10 processes will run in parallel.

Is it possible to do this in shell?

Best Answer

A quick trip to Google reveals this interesting approach: http://pebblesinthesand.wordpress.com/2008/05/22/a-srcipt-for-running-processes-in-parallel-in-bash/

for ARG in  $*; do
    command $ARG &
    NPROC=$(($NPROC+1))
    if [ "$NPROC" -ge 4 ]; then
        wait
        NPROC=0
    fi
done
Related Question