Escape sequences needed when using tilde ~ operator in awk

awkescape-characters

I have a pattern variable with below value:

\"something//\\anotherthing'

and a file with below contents:

\"something//\\anotherthing'
\"something//\\anotherthing
\"something/\anotherthing'
\"something\anotherthing'
\\"something\/\/\\\\anotherthing'

When I compare a line read from the file against the pattern in the environment with == operator, I get the expected output:

patt="$pattern" awk '{print $0, ENVIRON["patt"], ($0 == ENVIRON["patt"]?"YES":"NO") }'  OFS="\t" file
\"something//\\anotherthing'    \"something//\\anotherthing'    YES
\"something//\\anotherthing     \"something//\\anotherthing'    NO
\"something/\anotherthing'      \"something//\\anotherthing'    NO
\"something\anotherthing'       \"something//\\anotherthing'    NO
\\"something\/\/\\\\anotherthing'       \"something//\\anotherthing'    NO

But when I do the same with the ~ operator, the tests never match.
(I expected YES on the first line, as above):

patt="$pattern" awk '{print $0, ENVIRON["patt"], ($0 ~ ENVIRON["patt"]?"YES":"NO") }'  OFS="\t" file
\"something//\\anotherthing'    \"something//\\anotherthing'    NO
\"something//\\anotherthing     \"something//\\anotherthing'    NO
\"something/\anotherthing'      \"something//\\anotherthing'    NO
\"something\anotherthing'       \"something//\\anotherthing'    NO
\\"something\/\/\\\\anotherthing'       \"something//\\anotherthing'    NO

To fix the issue with ~ comparison I need to double escape the escapes:

patt="${pattern//\\/\\\\}" awk '{print $0, ENVIRON["patt"], ($0 ~ ENVIRON["patt"]?"YES":"NO") }'  OFS="\t" file
\"something//\\anotherthing'    \\"something//\\\\anotherthing' YES
\"something//\\anotherthing     \\"something//\\\\anotherthing' NO
\"something/\anotherthing'      \\"something//\\\\anotherthing' NO
\"something\anotherthing'       \\"something//\\\\anotherthing' NO
\\"something\/\/\\\\anotherthing'       \\"something//\\\\anotherthing' NO

Note the double escapes in result of printing ENVIRON["patt"] in second column.

Question:

Where does escape sequence in awk happening when using tilde ~ comparison operator? on $0 (or $1, $2, …) or in ENVIRON["variable"]?

Best Answer

The ~ operator does pattern matching, treating the right hand operand as an (extended) regular expression, and the left hand one as a string. POSIX says:

A regular expression can be matched against a specific field or string by using one of the two regular expression matching operators, '~' and "!~". These operators shall interpret their right-hand operand as a regular expression and their left-hand operand as a string.

So ENVIRON["patt"] is treated as a regular expression, and needs to have all characters that are special in EREs to be escaped, if you don't want them to be have their regular ERE meanings.


Note that it's not about using $0 or ENVIRON["name"], but the left and right sides of the tilde. This would take the input lines (in $0) as the regular expression to match against:

str=foobar awk 'ENVIRON["str"] ~ $0 { 
     printf "pattern /%s/ matches string \"%s\"\n", $0, ENVIRON["str"] }'
Related Question