Awk – How to Use Shell Variable Inside a Pattern

awk

I have a text file with tab-separated columns which I'd like to process using awk.

Here's an example of such a file:

size=1\tname=foo\tweight=1.2
weight=2.5\tname=bar\tsize=2

What I want to achieve is to normalize the numeric value in columns whose content is like $field_name=<number> to four decimal places and keep the rest as is. Here, $field_name is a shell variable that is passed to awk and I'd like to use its value inside a regex.

Here's a snippet (which is not working of course). I'm particularly interested in fixing line #5 in the following awk script and not solutions using other tools, such as sed, perl, python, etc.

$ cat "${file}" \                                       # 1
    | awk -F "\t" -v field_name="${external_var}" '     # 2
      {                                                 # 3
        for (i = 1; i <= NF; i++) {                     # 4
          if ($i ~ /$field_name=[0-9]*.?[0-9]+/) {      # 5
            split($i, kv, "=")                          # 6
            $i = sprintf("%s=%.4f", kv[1], kv[2])       # 7
          }                                             # 8
        }                                               # 9
        print $0                                        # 10
      }'

Best Answer

That should be:

if ($i ~ field_name "=[0-9]*.?[0-9]+") ...

Or:

 regexp = field_name "=[0-9]*.?[0-9]+"
 if ($i ~ regexp) ...

Note that . matches any single character. If you want to match a literal ., you'd need regexp to contain \. (which inside double quotes would have to be written \\.) or [.].

 regexp = field_name "=[0-9]*\\.?[0-9]+"

I'd also expect you'd want to anchor the regexp:

 regexp = "^" field_name "=[0-9]*\\.?[0-9]+$"

Other notes:

cat "${file}" is a UUOC which also has the drawback (over a redirection) that it doesn't work when $file starts with - and still runs awk if the file can't be opened.
-v field_name="$external_data" mangles backslashes. Another approach that doesn't have the problem is to use an environment variable: FIELD="$external_data" awk ... and refer to it within awk as ENVIRON["FIELD"].
as the contents of field_name is copied verbatim into regexp, it is treated as a regexp, so if $external_data contains regexp operators (.+*?{}()[]\^%...), it may not work properly.
in some locales and awk implementations, [0-9] matches a lot more characters than just 0123456789 (though I suspect it would be (non-ASCII) characters unlikely to occur in your input).

With perl:

FIELD=size <"$file" perl -lpe '
  s{
    (?<![^\t])       # not-preceded by a non-TAB
    \Q$ENV{FIELD}=\E # contents of $FIELD taken literally
    \K               # matched portion starts here
    \d*\.?\d+
    (?![^\t])        # not followed by a non-TAB
  }{
    sprintf "%.4f", $&
  }gxe'

Which would not have any of the issues discussed above (it would also work better than awk with input that contains invalid text, like a mix of text and binary data, or text encoded in a charset different from that of the user's locale).

Related Solutions

Bash Scripting – How to Use a Shell Variable in awk

You seem to be confusing awk variables and shell variables. awk -v vawk="$1" creates an awk variable called vawk, yet you are trying to use shell syntax ($vawk). This doesn't work because the shell doesn't have a variable called vawk. I think what you want is

awk -v vawk="$1" '$0 ~ vawk { c++ } # ...'
#                      ^ awk variable syntax

shell – Pass Shell Variable as a Pattern to awk

Use awk's ~ operator, and you don't need to provide a literal regex on the right-hand side:

function _process () {
    awk -v l="$line" -v pattern="$1" '
        $0 ~ pattern {p=1} 
        END {if(p) print l >> "outfile.txt"}
    '  
}

Although this would be more efficient (don't have to read the whole file)

function _process () {
    grep -q "$1" && echo "$line"
}

Depending on the pattern, may want grep -Eq "$1"

Best Answer

Related Solutions

Bash Scripting – How to Use a Shell Variable in awk

shell – Pass Shell Variable as a Pattern to awk

Related Question