Matching Five Columns in two Files using Awk

awksed

I have two Input files.

File1:

s2/80   20      .       A       T       86      F=5;U=4
s2/20   10      .       G       T       90      F=5;U=4
s2/90   60      .       C       G       30      F=5;U=4

File2:

s2/90   60      .       G       G       97      F=5;U=4
s2/80   20      .       A       A       20      F=5;U=4
s2/15   11      .       A       A       22      F=5;U=4
s2/90   21      .       C       C       82      F=5;U=4
s2/20   10      .       G       .       99      F=5;U=4
s2/80   10      .       T       G       11      F=5;U=4
s2/90   60      .       G       T       55      F=5;U=4

Expected Output:

s2/80  20 . A   T   86  F=5;U=4  s2/80  20  . A   A   20     F=5;U=4
s2/20  10 . G   T   90  F=5;U=4  s2/20  10  . G   .   99     F=5;U=4

Logic:
I want all the lines from File1 and File2 concatenated in the Output file:
Conditions:
If Column 1, 2, 4 of File1 and File2 exactly match and if Column 5 of File2 has a dot ie "." or if it match exactly with Column 4 of file2.

Code:
I tried using the script:

BEGIN{}
FNR==NR{
k=$1" "$2
a[k]=$4" "$5
b[k]=$0
c[k]=$4
d[k]=$5
next
}

{ k=$1" "$2
lc=c[k]
ld=d[k]
# file1 file2
if ((k in a) && ($4==$5) && (lc==$4)) print b[k]" "$0
}

But I get an Output of:

s2/80  20 . A   T   86  F=5;U=4  s2/80  20  . A   A   20     F=5;U=4

Whereas My output should be:

s2/80  20 . A   T   86  F=5;U=4  s2/80  20  . A   A   20     F=5;U=4
s2/20  10 . G   T   90  F=5;U=4  s2/20  10  . G   .   99     F=5;U=4

I would appreciate your help.
Thanks.

Best Answer

awk '
    {
        key = $1 SUBSEP $2 SUBSEP $4
    }
    # here, we are reading file1
    NR == FNR {
        f1_line[key] = $0 
        next
    }
    # here, we are reading file2
    key in f1_line && ($5 == "." || $5 == $4) {
        print f1_line[key], $0
    }
' file1 file2

outputs

s2/80   20      .       A       T       86      F=5;U=4 s2/80   20      .       A       A       20      F=5;U=4
s2/20   10      .       G       T       90      F=5;U=4 s2/20   10      .       G       .       99      F=5;U=4
Related Question