awk – How to Replace a String in One File if a Pattern is Present in Another File

awktext processing

I have a data file A.txt (field separator = \t) :

Well    Well Type   Well Name   Dye Target  
A1      Unknown     HIGH-001    FAM ViroFAM                 
A1      Unknown     HIGH-001    HEX ViroHEX

And a template file B.txt:

kit
Software Version = NOVA_v1
Date And Time of Export = 07/02/2020 13:44:11 UTC
Experiment Name =
Instrument Software Version =
Instrument Type = CFX
Instrument Serial Number =
Run Start Date =
Run End Date =
Run Operator =
Batch Status = VALID
Method = Novaprime
Date And Time of Export,Batch ID,Sample Name,Well,Sample Type,Status,Interpretive Result,Action*,Curve analysis
,taq,205920777.1,A01,Unkn-01
,taq,neg5,A02,Unkn-09
,,,,,,,,,,
*reporting.

And I want to print replace the value after the =in the second line of B.txt with VIRO_v1, but only when the pattern ViroHEX is present in the 5th column of A.txt.

In order to do that I start something like :

awk -F'\t' '
  FNR==NR{ a[NR]=$2; next }
  $1=="Software Version"{ print $0,"VIRO_v1"; next }
  1
' B.txt FS=" =" B.txt > result.txt

But I didn't figure it out the part with A.txt. Do you have an idea how to do that?

Best Answer

awk -F'\t' '
  NR==FNR{ if ($5=="ViroHEX"){ viro=1 } next }
  viro && $1=="Software Version"{ $2="VIRO_v1" }
  1
' A.txt FS=" = " OFS=" = " B.txt > result.txt

This replaces the second field (NOVA_v1) with VIRO_v1 in the second file if the first field equals Software Version and ViroHEX is present anywhere in the 5th column of the first file.

I'm assuming the field separator of the second file is <space>=<space> (not a tab).

Explanation:

You asked for each flag or option to be explained, and I've got the time, so here you go. I'm explaining the final (shortest) version out of the three Sed commands listed above.

The first part of the line is an address range: /startregex/,/stopregex/ The substitute command which follows the address range is only applied to lines from startregex to stopregex (inclusive).

In this case the start regex is /\[ABC\]/. Square brackets are usually special characters within a regex, so we put a backslash before each to signify literal square bracket characters.

The stop regex is /^\[/, which uses the special regex character ^ to signify the start of a line. This pattern will match any line that starts with a literal left square bracket ([).

The substitute command is basically quite simple; the general format is s/findregex/replacetext/. It can also have special flags placed after the final / to modify its behavior, but I'm not using any such flags here.

The "find regex" is ^$value1=$.*$.

The caret (^) matches the start of the line, as mentioned earlier, and the dollar sign ($) matches the end of the line. So this whole pattern must match an entire line, not merely part of one.

The parentheses (()), unlike square brackets, are non-special by default in regexes, so we put the backslashes before them to give them their special meaning. They allow parts of the matched text (the text matched by the "find regex") to be used in the replacement text. Specifically, the \1 in the replacement text means, "The text matched within the first set of parentheses in the regex." In this case, that is always just "value1=".

The final element in the "find regex" is .*. The dot (.) means "any single character," and the asterisk (*) means "any number of times (zero or more)." So the dot star (.*) matches the entire rest of the line, after the equals sign.

"notbla" in the replacement text is just static text, nothing special about it.

To really learn Sed properly, I highly recommend the Grymoire Sed tutorial, which is free online.

Best Answer

Related Solutions

Sed/awk replace a specific pattern under another pattern

Explanation:

Related Question