Bash – Pass input to multiple commands and compare their outputs

bashcommand lineprocess-substitutionshell-script

I am trying to pass standard input into multiple commands and compare their outputs. My current attempt seems close, but doesn't quite work – plus it relies on temporary files which I feel would not be necessary.

An example of what I would want my script to do:

$ echo '
> Line 1
> Line B
> Line iii' | ./myscript.sh 'sed s/B/b/g' 'sed s/iii/III/' 'cat'
1:Line B     2:Line b
1:Line iii   3:Line III

So far I have this:

i=0
SOURCES=()
TARGETS=()

for c in "$@"; do
    SOURCES+=(">($c > tmp-$i)")
    TARGETS+=("tmp-$i")
    i=$((i+1))
done

eval tee ${SOURCES[@]} >/dev/null <&0
comm ${TARGETS[@]}

The issues are:

  • There seems to be a race condition. By the end of execution comm tmp-0 tmp-1 has the desired output (more-or-less) but when executed from the script the output seems non-deterministic.
  • This is limited to just 2 inputs, but I need at least 3 (ideally any number)
  • This creates temporary files that I would have to keep track of and delete afterwards, an ideal solution would only use redirection

The constraints are:

  • The input may not be ending. In particular the input could be something like /dev/zero or /dev/urandom, so merely copying the input to a file won't work.
  • The commands may have spaces in them and be fairly complicated themselves
  • I want a line-by-line, in-order comparison.

Any idea how I could go about implementing this? I basically want something like echo $input | tee >(A >?) >(B >?) >(C >?) ?(compare-all-files) if only such a syntax existed.

Best Answer

Since the accepted answer is using perl, you can just as well do the whole thing in perl, without other non-standard tools and non-standard shell features, and without loading unpredictably long chunks of data in the memory, or other such horrible misfeatures.

The ytee script from the end of this answer, when used in this manner:

ytee command filter1 filter2 filter3 ...

will work just like

command <(filter1) <(filter2) <(filter3) ...

with its standard input piped to filter1, filter2, filter3, ... in parallel, as if it were with

tee >(filter1) >(filter2) >(filter3) ...

Example:

echo 'Line 1
Line B
Line iii' | ytee 'paste' 'sed s/B/b/g | nl' 'sed s/iii/III/ | nl'
     1  Line 1       1  Line 1
     2  Line b       2  Line B
     3  Line iii             3  Line III

This is also an answer for the two very similar questions: here and here.

ytee:

#! /usr/bin/perl
#   usage: ytee [-r irs] { command | - } [filter ..]
use strict;
if($ARGV[0] =~ /^-r(.+)?/){ shift; $/ = eval($1 // shift); die $@ if $@ }
elsif(! -t STDIN){ $/ = \0x8000 }
my $cmd = shift;
my @cl;
for(@ARGV){
    use IPC::Open2;
    my $pid = open2 my $from, my $to, $_;
    push @cl, [$from, $to, $pid];
}
defined(my $pid = fork) or die "fork: $!";
if($pid){
    delete $$_[0] for @cl;
    $SIG{PIPE} = 'IGNORE';
    my ($s, $n);
    while(<STDIN>){
        for my $c (@cl){
            next unless exists $$c[1];
            syswrite($$c[1], $_) ? $n++ : delete $$c[1]
        }
        last unless $n;
    }
    delete $$_[1] for @cl;
    while((my $p = wait) > 0){ $s += !!$? << ($p != $pid) }
    exit $s;
}
delete $$_[1] for @cl;
if($cmd eq '-'){
    my $n; do {
        $n = 0; for my $c (@cl){
            next unless exists $$c[0];
            if(my $d = readline $$c[0]){ print $d; $n++ }
            else{ delete $$c[0] }
        }
    } while $n;
}else{
    exec join ' ', $cmd, map {
        use Fcntl;
        fcntl $$_[0], F_SETFD, fcntl($$_[0], F_GETFD, 0) & ~FD_CLOEXEC;
        '/dev/fd/'.fileno $$_[0]
    } @cl;
    die "exec $cmd: $!";
}

notes:

  1. code like delete $$_[1] for @cl will not only remove the file handles from the array, but will also close them immediately, because there's no other reference pointing to them; this is different from (properly) garbage collected languages like javascript.

  2. the exit status of ytee will reflect the exit statuses of the command and filters; this could be changed/simplified.

Related Question