Shell – GNU Parallel doesn’t run the jobs until the program has exited

gnu-parallelscriptingshell-scriptxargszsh

When I run (rss-notifier's code is included at the end),

rss-notifier.zsh https://www.wuxiaworld.com/feed/chapters ".*"|parallel --null -k --lb echo {}

I get,

Title: Sovereign of the Three Realms - ?Chapter 1339: Tears Of Joy
Link: https://www.wuxiaworld.com/novel/sovereign-of-the-three-realms/sotr-chapter-1339

Title: Renegade Immortal - Chapter 1259 - Rebuking the Everlasting Sect, Flowing Time
Link: https://www.wuxiaworld.com/novel/renegade-immortal/rge-chapter-1259

Title: Condemning the Heavens - Chapter 242: Three-eyed Troublemaker
Link: https://www.wuxiaworld.com/novel/condeming-the-heavens/cth-chapter-242

Title: Condemning the Heavens - Chapter 241: Join Us
Link: https://www.wuxiaworld.com/novel/condeming-the-heavens/cth-chapter-241

But when I run,

rss-notifier.zsh https://www.wuxiaworld.com/feed/chapters ".*"|parallel --null -k --lb -N 2 echo {1} {2}

I get nothing because parallel waits for the program to exit first.

How can I solve this problem? I just want parallel to execute my command {1} ... {2} for each two null-separated strings it reads from stdin.

Here is rss-notifier:

#!/usr/bin/env zsh
rsstail -l -u "$1" -n 9 | while read -r line1
do
    read -r line2
    if ggrep -P --silent "$2" <<< "$line1" ; then
        printf '%b' "$line1"'\0'"$line2"'\0'
        echo
    fi
done

Or for a simpler reproducer:

(printf '%s\0' {1..4}; sleep 2) | parallel --null -k --lb -N 2 echo {1} {2}

Update: I would also be satisfied by any alternative utility that can accomplish my use case. Here is how to do it with xargs, but it is not very graceful:

Passing multiple parameters via xargs

Best Answer

You are being hit by two issues.

This

(seq 200; sleep 20) | parallel -j10  -k echo

prints:

1
2

and then stalls until the sleep 20 is done.

A partial fix seems to be to move start_more_jobs() outside the while loop:

--- a/src/parallel
+++ b/src/parallel
@@ -4062,9 +4062,8 @@ sub reaper {
        # $stiff = pid of dead process
        if(wantarray) {
            push(@pids_reaped,$stiff);
-       } else {
-           $children_reaped++;
        }
+       $children_reaped++;
         if($Global::sshmaster{$stiff}) {
             # This is one of the ssh -M: ignore
             next;
@@ -4112,12 +4111,12 @@ sub reaper {
             }
         }
        $job->cleanup();
-       start_more_jobs();
        if($opt::progress) {
            my %progress = progress();
            ::status_no_nl("\r",$progress{'status'});
        }
     }
+    if($children_reaped) { start_more_jobs(); }
     $opt::sqlmaster and $Global::sql->run("COMMIT;");
     debug("run", "done ");
     return wantarray ? @pids_reaped : $children_reaped;

This may cost some performance if you have many short lived jobs. I have not measured how much.

The other part of the problem is due to a design decision in GNU Parallel.

Arguments in GNU Parallel are read using the diamond operator (<>). This reads a full line before continuing. Reading from (sleep 20) only generates an end-of-file after sleep finishes, and thus blocks until sleep finishes.

So when GNU Parallel reads the final byte, it has to wait for the sleep to finish to discover that this indeed is the end-of-file.

I see no easy way to change that part of the design.

Luckily this does not block the jobs from being run as you can see if you run date. The jobs are started immediately, it is just the output that is waiting for the sleep:

(seq 20; sleep 5) | parallel -j10  -k 'date;echo'

In other words: Your problem is not related to -N2. You can not see the problem here:

(printf '%s\0' {1..4}; sleep 2) | parallel --null -k --lb -N 2 echo {1} {2}

But you can see the problem here. This pauses before the last 4-8 elements:

(printf '%s\0' {1..40}; sleep 2) | parallel -j4 --null -k --lb -N 2 echo {1} {2}

This pauses before the last 8-10 elements:

(printf '%s\0' {1..40}; sleep 2) | parallel -j8 --null -k --lb -N 2 echo {1} {2}

By running date you can see the problem is not the starting of the jobs - it is only postponing the printing:

(printf '%s\0' {1..40}; sleep 2) | parallel -j4 --null -k --lb -N 2 'date;'echo {1} {2}