Create a field that can store values calculated from values in another file

awktext processing

I have two files, the first one (tab-delimited) looks like:

1    100    371    R1,R2,R4    12
5    167    16     R2,R5       5
8    242    490    R1,R3,R4    11

another looks like:

R1   0.167
R2   0.171
R3   0.156
R4   0.162
R5   0.159

I want to add one more field in the first file, the values in this field are the sum of the matching values of R1, R2,…,R5 in the second file then divided by the values in the fourth field.

For example, the first line has R1, R2, R4, so the value I want is (0.167+0.171+0.162)/12 = 0.0416667

Expected output:

1    100    371    R1,R2,R4    12    0.0416667
5    167    16     R2,R5       5     0.066
8    242    490    R1,R3,R4    11    0.0440909

How to write the awk command?

Best Answer

awk solution:

awk 'NR==FNR{ a[$1]=$2; next };{ len=split($4,b,","); s=0; 
     for(i=1;i<=len;i++) s+=a[b[i]]; $6=s/$5 }1' file2 OFS='\t' file1 | column -tx

The output:

1  100  371  R1,R2,R4  12  0.0416667
5  167  16   R2,R5     5   0.066
8  242  490  R1,R3,R4  11  0.0440909

  • a[$1]=$2 - capturing keys/values from the 2nd file

  • split($4,b,",") - splitting the 4th field of the 1st file into array of "keys"

  • len - number of chunks

  • s+=a[b[i]] - accumulating values for the matched "keys"

Related Question