Text Processing – Add Column Based on Matching of Second Column

columnsfilesjoin;mergetext processing

I have two files. The first file has the following format:

10D0325 2465 0 0 -9 -9
10D0598 2567 0 0 -9 -9
10D0562 2673 0 0 -9 -9
10D0175 2457 0 0 -9 -9
10D0241 2209 0 0 -9 -9
10D0954 2312 0 0 -9 -9
10D0446 2489 0 0 -9 -9

The second file has this format:

10D0325 1
10D0598 1
10D0175 2
10D0954 1
10D0446 2

What I want to do is add the second column of the second file to the first file, based on the ID variable.
As you can see the first column can be used as an identifier variable to match the first dataset with the second. However, the first file contains some lines/ID's that are not present in the second file. Therefore I cannot simply order both files and paste this column into the first file.

There must be a fairly easy way to do this, unfortunately my Linux skills are limited.

P.S. For the sake of clarity, this is what I would like the resulting file to look like (any other symbol may be used to indicate missings instead of blank spaces):

10D0325 2465 0 0 -9 -9 1
10D0598 2567 0 0 -9 -9 1
10D0562 2673 0 0 -9 -9
10D0175 2457 0 0 -9 -9 2
10D0241 2209 0 0 -9 -9
10D0954 2312 0 0 -9 -9 1
10D0446 2489 0 0 -9 -9 2

Best Answer

That's pretty easy with awk:

awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$1]}' file2 file1

First (when file2 is being read) we create an array a which stores second column from file2, indexed with first column. And then we print file1 adding value from an array.

Related Question