Consider the following scenario. I have two programs A and B. Program A outputs to stdout lines of strings while program B process lines from stdin. The way to use these two programs is of course:
foo@bar:~$ A | B
Now I've noticed that this eats up only one core; hence I am wondering:
Are programs A and B sharing the same computational resources? If so, is there a way to run A and B concurrently?
Another thing that I've noticed is that A runs much much faster than B, hence I am wondering if could somehow run more B programs and let them process the lines that A outputs in parallel.
That is, A would output its lines, and there would be N instances of programs B that would read these lines (whoever reads them first) process them and output them on stdout.
So my final question is:
Is there a way to pipe the output to A among several B processes without having to take care of race conditions and other inconsistencies that could potentially arise?
Best Answer
A problem with
split --filter
is that the output can be mixed up, so you get half a line from process 1 followed by half a line from process 2.GNU Parallel guarantees there will be no mixup.
So assume you want to do:
But that B is terribly slow, and thus you want to parallelize that. Then you can do:
GNU Parallel by default splits on \n and a block size of 1 MB. This can be adjusted with --recend and --block.
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1