awk '
{
key = $1 SUBSEP $2 SUBSEP $4
}
# here, we are reading file1
NR == FNR {
f1_line[key] = $0
next
}
# here, we are reading file2
key in f1_line && ($5 == "." || $5 == $4) {
print f1_line[key], $0
}
' file1 file2
outputs
s2/80 20 . A T 86 F=5;U=4 s2/80 20 . A A 20 F=5;U=4
s2/20 10 . G T 90 F=5;U=4 s2/20 10 . G . 99 F=5;U=4
This should do it:
join -j 2 -o 1.1,1.2,1.3,2.3 file1 file2
Important: this assumes your files are sorted (as in your example) according to the SNP name. If they are not, sort them first:
join -j 2 -o 1.1,1.2,1.3,2.3 <(sort -k2 file1) <(sort -k2 file2)
Output:
0 AFFX-SNP-000541 NA 1
0 AFFX-SNP-002255 NA 1
1 rs12103 0.6401 0.5596
1 rs12103_1247494 0.696 0.5581
1 rs12142199 0.7672 0.4931
Explanation (from info join
):
`join' writes to standard output a line for each pair of input lines
that have identical join fields.
`-1 FIELD'
Join on field FIELD (a positive integer) of file 1.
`-2 FIELD'
Join on field FIELD (a positive integer) of file 2.
`-j FIELD'
Equivalent to `-1 FIELD -2 FIELD'.
`-o FIELD-LIST'
Otherwise, construct each output line according to the format in
FIELD-LIST. Each element in FIELD-LIST is either the single
character `0' or has the form M.N where the file number, M, is `1'
or `2' and N is a positive field number.
So, the command above joins the files on the second field and prints the 1st,2nd and 3rd field of file one, followed by the 3rd field of file2.
Best Answer
Using
paste
: