Text Processing – Compare 1st Column of 1st File and 2nd Column of 2nd File

awkgreprtext processing

File1: Excel file (.xls)

UN          ID    St      M1    M2       SE    DOF  PV        PA            FC
17127159    0   -5.9    297.3   765.7   0.22    4   0.003   0.00389231  2.57536
17127163    2   -3.87   189.914 492.307 0.3548  4   0.0179  0.01795     2.59226
17127167    4   -3.8908 339.136 855.276 0.3429  4   0.0176  0.017       2.52192
17127171    6   -3.922  390.44  986.365 0.340   4   0.0172179   0.01721 2.52627
17127175    8   -4.715  536.072 1210.65 0.2492  4   0.00920158  0.00920 2.258

File2: Text file (.txt)

UNIT_ID   UN      TID        X       E       GG7     J     O
0      17127159 16657436 353.568 335.295 221.717 815.654 684.85
1      17127161 16657436 11.0842 7.01459 7.33511 11.2121 12.6268
2      17127163 16657450 221.647 226.774 136.274 431.32  392.533
3      17127165 16657452 5.02182 3.41172 4.12834 6.90306 4.91183

If 1st column of 1st file matches with 2nd column of 2nd file extract the matched rows of 2nd file from column 3 to column 9 and save them in the first file.

Can anyone help me ?

Output should be saved in a new file

output:

UN        ID   St  M1    M2    SE   DOF PV    PA    FC    TID     X  E  GG7  J O
17127159  0   -5.9  297.3   765.7   0.22    4   0.003   0.00389231  2.57536  16657436 353.568 335.295 221.717 815.654 684.85

Best Answer

An awk solution:

$ awk 'NR==FNR{a[$2]=$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9; next} 
              {
                if($1 in a){
                    print $0,a[$1]
                }
               }' file2 file1
UN          ID    St      M1    M2       SE    DOF  PV        PA            FC TID  X   E   GG7 J   O   
17127159    0   -5.9    297.3   765.7   0.22    4   0.003   0.00389231  2.57536 16657436    353.568 335.295 221.717 815.654 684.85  
17127163    2   -3.87   189.914 492.307 0.3548  4   0.0179  0.01795     2.59226 16657450    221.647 226.774 136.274 431.32  392.533

Explanation

Awk splits each input line into fields (at whitespace, by default), making the 1st field $1 the 2nd $2 etc. The special variable NR is the current input line number and FNR is the current line number of the file being read. Therefore, when processing multiple files, the two are equal only while the first file is being read.

NR==FNR{a[$2]=$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9; next} : if we're reading the first file, save fields 3 through 9 (joined by tabs) as the value in the array a whose key is the 2nd field. Then, skip to the next line.
The next ensures that the rest of the script will not be run for the first file (file2) but only the second (file1).
if($1 in a){ print $0,a[$1] } : we're now in the second file (file1). If the first field exists as a key in the a array (if($1 in a)), then print the current line $0 and the value stored in a for $1: fields 3 through 9 from file2.

Related Solutions

Compare two files and print matches – large files

If the files are sorted (the samples you posted are) then it's as simple as

join -t : File1.txt File2.txt

join pairs up lines from two files where the join field is equal. By default, the join field is the first field, the fields are output in order except that the join field is not repeated, and non-pairable lines are skipped, which is exactly what you want.

Note that if the files have Windows line endings, they appear under Unix systems to have an extra carriage return character at the end of each line. The CR is mostly visually invisible, but as far as join and other text tools are concerned, it's a character like any one else, and it means the fields of File1.txt all end with a CR whereas the ones in File2.txt don't so they don't match. You need to strip the CR, at least in File1.txt.

<File1.txt tr -d '\r' | join -t : - File2.txt

You do need to sort the files. If they aren't, then ksh/bash/zsh, you can use process substitutions. (Add tr -d '\r' | if needed.)

join -t : <(sort File1.txt) <(sort File2.txt)

In plain sh, if your Unix variant has /dev/fd (most do), you can use that instead to pipe the output of two programs through two file descriptors.

sort File2.txt | { sort File1.txt | join -t : /dev/fd/0 /dev/fd/3; } 3<&1

If you need to preserve the original order of File1.txt and it isn't sorted by the join field, then add line numbers to remember the original order, sort by the join field, join, sort by line numbers and strip the line numbers. (You can do something similar if you want to preserver the order of the other file.)

<File1.txt nl -s : |
sort -t : -k 2 |
join -t : -1 2 - <(sort File2.txt) |
sort -t : -k 2,2n |
cut -d : -f 1,3

Best Answer

Explanation

Related Solutions

Compare two files and print matches – large files

Related Question