Replace field in one file based on match with a different field in another file

text processing

appreciate any help with this task:

File A has 4 fields and has 90K lines. The first field (column 1) has a value that needs to be changed if criteria below met. The fourth field (column 4) has the data referenced in File B. The data in File A are tab separated DNS records:

Owner IN Type RData

FileA:

hostname1 IN A 10.10.20.1 
hostname2 IN A 10.10.20.2 
hostname3 IN A 10.10.20.3

FileB:

10.10.20.1 
10.10.20.2 
10.10.20.58 
10.10.21.245 
10.10.23.7

File B is single column (one field) and 1400 lines. The data in File B are IP addresses.

Requirement:
For every line in File B, replace the contents of the the first field in File A if the fourth field in File A matches.

In English:
For every IP that's listed in File B, replace the Owner value in File A with a specific value.

Best Answer

(modifying from my answer here...)

You can compare NR with FNR to distinguish between processing the first or the subsequent files. This is because FNR is reset per file, while NR is the running tally. Therefore, only during the processing of the first file will the condition NR==FNR be satisfied.

To process FileB first...

awk 'NR==FNR{a[$1]=1}'

Setting the value to a 'dummy' one like 1 is good enough.

Then, to process FileA...

awk -F'\t' 'BEGIN{OFS=FS}NR!=FNR{if(a[$4]){$1="Reserved"};print}'

Here, the output field separator OFS is set as FS to preserve the formatting when awk reconstructs the full line ($0).

Putting both together:

awk -F'\t' 'BEGIN{OFS=FS}NR==FNR{a[$1]=1}NR!=FNR{if(a[$4]){$1="Reserved"};print}' FileB FileA

If you want it to be slightly more terse...

awk -F'\t' 'BEGIN{OFS=FS}NR==FNR{a[$1]=1;next}a[$4]{$1="Reserved"}1' FileB FileA

next is used in the processing of the first file to skip the final printing indicated by (the final) 1, which is a shortcut of doing {print $0}. With this, we can shift the condition a[$4] (i.e. true if it exist in r) 'out' to be the condition for determining whether we need to change $1 to "Reserved" or not.

Related Solutions

AWK – Joining Multiple Columns from Different Files

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ datasets[$1]; fnames[FILENAME]; vals[$1,FILENAME] = $2 }
END {
    printf "%s", "dataset"
    for (fname in fnames) {
        printf "%s%s", OFS, fname
    }
    print ""
    for (dataset in datasets) {
        printf "%s", dataset
        for (fname in fnames) {
            printf "%s%s", OFS, vals[dataset,fname]
        }
        print ""
    }
}

$ tail -n +1 file?
==> file1 <==
a       1
b       2
c       3

==> file2 <==
a       2
c       3

$ awk -f tst.awk file1 file2
dataset file1   file2
a       1       2
b       2
c       3       3

Add as many files to the list as you like.

awk – How to Replace a String in One File if a Pattern is Present in Another File

awk -F'\t' '
  NR==FNR{ if ($5=="ViroHEX"){ viro=1 } next }
  viro && $1=="Software Version"{ $2="VIRO_v1" }
  1
' A.txt FS=" = " OFS=" = " B.txt > result.txt

This replaces the second field (NOVA_v1) with VIRO_v1 in the second file if the first field equals Software Version and ViroHEX is present anywhere in the 5th column of the first file.

I'm assuming the field separator of the second file is <space>=<space> (not a tab).

Best Answer

Related Solutions

AWK – Joining Multiple Columns from Different Files

awk – How to Replace a String in One File if a Pattern is Present in Another File

Related Question