Create a field that can store values calculated from values in another file

awktext processing

I have two files, the first one (tab-delimited) looks like:

1    100    371    R1,R2,R4    12
5    167    16     R2,R5       5
8    242    490    R1,R3,R4    11

another looks like:

I want to add one more field in the first file, the values in this field are the sum of the matching values of R1, R2,…,R5 in the second file then divided by the values in the fourth field.

For example, the first line has R1, R2, R4, so the value I want is (0.167+0.171+0.162)/12 = 0.0416667

Expected output:

1    100    371    R1,R2,R4    12    0.0416667
5    167    16     R2,R5       5     0.066
8    242    490    R1,R3,R4    11    0.0440909

How to write the awk command?

Best Answer

awk solution:

awk 'NR==FNR{ a[$1]=$2; next };{ len=split($4,b,","); s=0; 
     for(i=1;i<=len;i++) s+=a[b[i]]; $6=s/$5 }1' file2 OFS='\t' file1 | column -tx

The output:

1  100  371  R1,R2,R4  12  0.0416667
5  167  16   R2,R5     5   0.066
8  242  490  R1,R3,R4  11  0.0440909

a[$1]=$2 - capturing keys/values from the 2nd file
split($4,b,",") - splitting the 4th field of the 1st file into array of "keys"
len - number of chunks
s+=a[b[i]] - accumulating values for the matched "keys"

Related Solutions

AWK – How to Replace Content of Specific Column in Tab Delimited File

You need to set Output Field Separator, to tab \t:

One way to do it is with -v option:

awk -vOFS='\t' '{$3 = "AD"; print}' file

another possibility inside awk, say in the BEGIN block:

awk 'BEGIN{OFS="\t"}{$3 = "AD"; print}' file

If you don't set OFS, then awk by default uses single space as a field separator.

Replace field in one file based on match with a different field in another file

(modifying from my answer here...)

You can compare NR with FNR to distinguish between processing the first or the subsequent files. This is because FNR is reset per file, while NR is the running tally. Therefore, only during the processing of the first file will the condition NR==FNR be satisfied.

To process FileB first...

awk 'NR==FNR{a[$1]=1}'

Setting the value to a 'dummy' one like 1 is good enough.

Then, to process FileA...

awk -F'\t' 'BEGIN{OFS=FS}NR!=FNR{if(a[$4]){$1="Reserved"};print}'

Here, the output field separator OFS is set as FS to preserve the formatting when awk reconstructs the full line ($0).

Putting both together:

awk -F'\t' 'BEGIN{OFS=FS}NR==FNR{a[$1]=1}NR!=FNR{if(a[$4]){$1="Reserved"};print}' FileB FileA

If you want it to be slightly more terse...

awk -F'\t' 'BEGIN{OFS=FS}NR==FNR{a[$1]=1;next}a[$4]{$1="Reserved"}1' FileB FileA

next is used in the processing of the first file to skip the final printing indicated by (the final) 1, which is a shortcut of doing {print $0}. With this, we can shift the condition a[$4] (i.e. true if it exist in r) 'out' to be the condition for determining whether we need to change $1 to "Reserved" or not.

Best Answer

Related Solutions

AWK – How to Replace Content of Specific Column in Tab Delimited File

Replace field in one file based on match with a different field in another file

Related Question