I have a text file with tab-separated columns which I'd like to process using awk.
Here's an example of such a file:
size=1\tname=foo\tweight=1.2
weight=2.5\tname=bar\tsize=2
What I want to achieve is to normalize the numeric value in columns whose content is like $field_name=<number>
to four decimal places and keep the rest as is. Here, $field_name
is a shell variable that is passed to awk and I'd like to use its value inside a regex.
Here's a snippet (which is not working of course). I'm particularly interested in fixing line #5 in the following awk script and not solutions using other tools, such as sed, perl, python, etc.
$ cat "${file}" \ # 1
| awk -F "\t" -v field_name="${external_var}" ' # 2
{ # 3
for (i = 1; i <= NF; i++) { # 4
if ($i ~ /$field_name=[0-9]*.?[0-9]+/) { # 5
split($i, kv, "=") # 6
$i = sprintf("%s=%.4f", kv[1], kv[2]) # 7
} # 8
} # 9
print $0 # 10
}'
Best Answer
That should be:
Or:
Note that
.
matches any single character. If you want to match a literal.
, you'd needregexp
to contain\.
(which inside double quotes would have to be written\\.
) or[.]
.I'd also expect you'd want to anchor the regexp:
Other notes:
cat "${file}"
is a UUOC which also has the drawback (over a redirection) that it doesn't work when$file
starts with-
and still runsawk
if the file can't be opened.-v field_name="$external_data"
mangles backslashes. Another approach that doesn't have the problem is to use an environment variable:FIELD="$external_data" awk ...
and refer to it withinawk
asENVIRON["FIELD"]
.field_name
is copied verbatim intoregexp
, it is treated as a regexp, so if$external_data
contains regexp operators (.+*?{}()[]\^%
...), it may not work properly.awk
implementations,[0-9]
matches a lot more characters than just0123456789
(though I suspect it would be (non-ASCII) characters unlikely to occur in your input).With
perl
:Which would not have any of the issues discussed above (it would also work better than
awk
with input that contains invalid text, like a mix of text and binary data, or text encoded in a charset different from that of the user's locale).