Bash – Make GNU Parallel not delay before executing arguments from STDIN

bashfifognu-parallelparallelismpipe

GNU Parallel, without any command line options, allows you to easily parallelize a command whose last argument is determined by a line of STDIN:

$ seq 3 | parallel echo
2
1
3

Note that parallel does not wait for EOF on STDIN before it begins executing jobs — running yes | parallel echo will begin printing infinitely many copies of y right away.

This behavior appears to change, however, if STDIN is relatively short:

$ { yes | ghead -n5; sleep 10; } | parallel echo

In this case, no output will be returned before sleep 10 completes.

This is just an illustration — in reality I'm attempting to read from a series of continually generated FIFO pipes where the FIFO-generating process will not continue until the existing pipes start to be consumed. For example, my command will produce a STDOUT stream like:

/var/folders/2b/1g_lwstd5770s29xrzt0bw1m0000gn/T/tmp.PFcggGR55i
/var/folders/2b/1g_lwstd5770s29xrzt0bw1m0000gn/T/tmp.UCpTBzI3J6
/var/folders/2b/1g_lwstd5770s29xrzt0bw1m0000gn/T/tmp.r2EmSLW0t9
/var/folders/2b/1g_lwstd5770s29xrzt0bw1m0000gn/T/tmp.5TRNeeZLmt

Manually cat-ing each of these files one at a time in a new terminal causes the FIFO-generating process to complete successfully. However, running printfifos | parallel cat does not work. Instead, parallel seems to block forever waiting for input on STDIN — if I modify the pipeline to printfifos | head -n4 | parallel cat, the deadlock disappears and the first four pipes are printed successfully.

This behavior seems to be connected to the --jobs|-j parameter. Whereas { yes | ghead -n5; sleep 10; } | parallel cat produces no output for 10 seconds, adding a -j1 option yields four lines of y almost immediately followed by a 10 second wait for the final y. Unfortunately this does not solve my problem — I need every argument to be processed before parallel can get EOF from reading STDIN. Is there any way to achieve this?

Best Answer

A bug in GNU Parallel does, that it only starts processing after having read one job for each jobslot. After that it reads one job at a time.

In older versions the output will also be delayed by the number of jobslots. Newer versions only delay output by a single job.

So if you sent one job per second to parallel -j10 it would read 10 jobs before starting them. Older versions you would then have to wait an additional 10 seconds before seeing the output from job 3.

A workaround the limitation at start is to feed one dummy job per jobslot to parallel:

true >jobqueue; tail -n+0 -f jobqueue | parallel &
seq $(parallel --number-of-threads) | parallel -N0 echo true >> jobqueue
# now add the real jobs to jobqueue

A workound the output is to use --linebuffer (but this will mix full lines from different jobs).

Related Question