Shell – Efficient way of comparing in awk

awklinuxscriptingshell

#!/bin/awk
BEGIN {
        while(getline var < compareTo > 0)
        {
                orderIds[var]=var;
        }
}
{
        if(orderIds[$0] == "")
        {
                print $0;
        }
}

Running as

awk -v compareTo="ids.log.remote" -f sample.awk ids.log.local

This is working, but instead of using associative arrays ( like HashMap ), is there anything like a HashSet in awk?

I got the timings

bash-3.2$ time grep -xFvf ids.log.local ids.log.remote > /dev/null

real    0m0.130s
user    0m0.127s
sys     0m0.002s
bash-3.2$ time grep -xFvf ids.log.local ids.log.remote > /dev/null

real    0m0.126s
user    0m0.125s
sys     0m0.000s
bash-3.2$ time grep -xFvf ids.log.local ids.log.remote > /dev/null

real    0m0.131s
user    0m0.128s
sys     0m0.002s
bash-3.2$ time awk 'NR == FNR {
  orderIds[$0]; next
  }
!($0 in orderIds)
  ' ids.log.local ids.log.remote > /dev/null

real    0m0.053s
user    0m0.051s
sys     0m0.003s
bash-3.2$ time awk 'NR == FNR {
  orderIds[$0]; next
  }
!($0 in orderIds)
  ' ids.log.local ids.log.remote > /dev/null

real    0m0.052s
user    0m0.051s
sys     0m0.001s
bash-3.2$ time awk 'NR == FNR {
  orderIds[$0]; next
  }
!($0 in orderIds)
  ' ids.log.local ids.log.remote > /dev/null

real    0m0.053s
user    0m0.051s
sys     0m0.002s
bash-3.2$ time awk -v compareTo="ids.log.local" -f checkids.awk ids.log.remote > /dev/null

real    0m0.066s
user    0m0.060s
sys     0m0.006s
bash-3.2$ time awk -v compareTo="ids.log.local" -f checkids.awk ids.log.remote > /dev/null

real    0m0.065s
user    0m0.058s
sys     0m0.008s
bash-3.2$ time awk -v compareTo="ids.log.local" -f checkids.awk ids.log.remote > /dev/null

real    0m0.061s
user    0m0.053s
sys     0m0.007s

@Dimitre Radoulov Looks like your awk is faster. Thanks.

Best Answer

I believe this is the most efficient way to do this in awk:

awk 'NR == FNR {
  orderIds[$0]; next
  }
!($0 in orderIds)
  ' ids.log.remote ids.log.local

You may try with grep too:

grep -xFVf ids.log.remote ids.log.local

Related Solutions

linux cron – Email Only Occasionally Sent on Output and Errors

Upon further testing, I suspect the & is messing with your results. As you point out, &>/dev/null is bash syntax, not sh syntax. As a result, sh is creating a subshell and backgrounding it. Sure, the subshell's echo creates stderr, but my theory is that:

cron is not catching the subshell's stderr, and
the backgrounding of the subshell always completes successfully, thus bypassing your || echo ....

... causing the cron job to have no output and thus no mail. Based on my reading of the vixie-cron source, it would seem that the job's stderr and stdout would be captured by cron, but it must be getting lost by the subshell.

Test it yourself in a /bin/sh environment (assuming you do not have a file named 'bar' here):

(grep foo bar) &
echo $?

Bash – How to Move Last 3 Lines of Pipeline to Top of Output with ed

You have actually stumbled on the shell, not on ed. This

echo -e "$-2,$m0\n,p\nQ"

means $- and $m0 undergo parameter expansion, as they are enclosed by double quotes. Run echo "$-" and echo "$m0" to see it for yourself. They should be enclosed in single quotes so that the shell does not expand them.

Since we are fixing it, let us also favor printf over echo. The latter has a non-uniform behavior accross implementations, while the former is sound. This should do:

printf '%s\n' '$-2,$m0' ',p' 'Q' | ed -s <(program | awk ...)

-s option has been added to ed, so as to "suppress diagnostics, byte counts and '!' prompt". This is purely cosmetic.

Sample execution (with a useless use of cat to simulate the process substitution):

$ cat input
ATOM    126  CD  GLN A 449      -2.853  11.592 119.709  1.00 17.95           C
ATOM    127  OE1 GLN A 449      -4.056  11.297 119.695  1.00 20.83           O
ATOM    128  NE2 GLN A 449      -1.948  10.876 120.359  1.00 14.98           N
HETATM  129  N   MSE A 450      -4.523  16.830 119.280  1.00 14.88           N
HETATM  130  CA  MSE A 450      -5.537  17.804 118.911  1.00 15.65           C

$ printf '%s\n' '$-2,$m0' ',p' 'Q' | ed -s <(cat input)
ATOM    128  NE2 GLN A 449      -1.948  10.876 120.359  1.00 14.98           N
HETATM  129  N   MSE A 450      -4.523  16.830 119.280  1.00 14.88           N
HETATM  130  CA  MSE A 450      -5.537  17.804 118.911  1.00 15.65           C
ATOM    126  CD  GLN A 449      -2.853  11.592 119.709  1.00 17.95           C
ATOM    127  OE1 GLN A 449      -4.056  11.297 119.695  1.00 20.83           O

Best Answer

Related Solutions

linux cron – Email Only Occasionally Sent on Output and Errors

Bash – How to Move Last 3 Lines of Pipeline to Top of Output with ed

Related Question