Shell Parallelism – Executing Piped Commands in Parallel

parallelismshell

Consider the following scenario. I have two programs A and B. Program A outputs to stdout lines of strings while program B process lines from stdin. The way to use these two programs is of course:

foo@bar:~$ A | B

Now I've noticed that this eats up only one core; hence I am wondering:

Are programs A and B sharing the same computational resources? If so, is there a way to run A and B concurrently?

Another thing that I've noticed is that A runs much much faster than B, hence I am wondering if could somehow run more B programs and let them process the lines that A outputs in parallel.

That is, A would output its lines, and there would be N instances of programs B that would read these lines (whoever reads them first) process them and output them on stdout.

So my final question is:

Is there a way to pipe the output to A among several B processes without having to take care of race conditions and other inconsistencies that could potentially arise?

Best Answer

A problem with split --filter is that the output can be mixed up, so you get half a line from process 1 followed by half a line from process 2.

GNU Parallel guarantees there will be no mixup.

So assume you want to do:

 A | B | C

But that B is terribly slow, and thus you want to parallelize that. Then you can do:

A | parallel --pipe B | C

GNU Parallel by default splits on \n and a block size of 1 MB. This can be adjusted with --recend and --block.

You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/

You can install GNU Parallel in just 10 seconds with:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22444
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351 93a7668d
21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80 e02a2244 40e8a43f
$ bash install.sh

Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1