I posted something similar a while ago and I thought, the code provided could help in solving my problem, however unfortunately I am not able to adjust it to my needs: awk – compare files and print lines from both files
So, I have again 2 tab-separated files.
file_1.txt
apple 2.5 5 7.2
great 3.8 10 3.6
see 7.6 3 4.9
tree 5.4 11 5
back 8.9 2 2.1
file_2.txt
apple :::N
back :::ADJ
back :::N
around :::ADV
great :::ADJ
bee :::N
see :::V
tree :::N
The output should look like:
apple :::N 2.5 5 7.2
great :::ADJ 3.8 10 3.6
back :::ADJ 8.9 2 2.1
back :::N 8.9 2 2.1
see :::V 7.6 3 4.9
tree :::N 5.4 11 5
The difference to the other post is, that I just like to compare the first columns of file_1.txt and file_2.txt and then print the whole line of file_1.txt with column 2 of file_1.txt to the outfile. I do not care in which order $2 of file_2.txt is printed to the outfile, so the outfile could as well look like
back 8.9 2 2.1 :::N
back 8.9 2 2.1 :::V etc.
The problem are the duplicates in column1 as back here. Otherwise I could of course just use paste
.
The problem with this `awk-command is, that it does not read column2 in the a array and if I tell it to print it, this is not possible of course.
awk 'NR==FNR {a[$1]; next} $1 in a {print $0, a[$2]}' OFS='\t' file_2.txt file_1.txt > outfile.txt
I am gladly appreciating any help! Sorry for the stupidity here also, seems that I am completely stumped.
Best Answer
If you have GNU
awk
(available from the repository via packagegawk
), which supports multi-dimensional arrays, you could doEx.
Otherwise, if output order is not important the easiest solution is probably to use the
join
command instead: