I have a pattern
variable with below value:
\"something//\\anotherthing'
and a file with below contents:
\"something//\\anotherthing'
\"something//\\anotherthing
\"something/\anotherthing'
\"something\anotherthing'
\\"something\/\/\\\\anotherthing'
When I compare a line read from the file against the pattern in the environment with ==
operator, I get the expected output:
patt="$pattern" awk '{print $0, ENVIRON["patt"], ($0 == ENVIRON["patt"]?"YES":"NO") }' OFS="\t" file
\"something//\\anotherthing' \"something//\\anotherthing' YES
\"something//\\anotherthing \"something//\\anotherthing' NO
\"something/\anotherthing' \"something//\\anotherthing' NO
\"something\anotherthing' \"something//\\anotherthing' NO
\\"something\/\/\\\\anotherthing' \"something//\\anotherthing' NO
But when I do the same with the ~
operator, the tests never match.
(I expected YES
on the first line, as above):
patt="$pattern" awk '{print $0, ENVIRON["patt"], ($0 ~ ENVIRON["patt"]?"YES":"NO") }' OFS="\t" file
\"something//\\anotherthing' \"something//\\anotherthing' NO
\"something//\\anotherthing \"something//\\anotherthing' NO
\"something/\anotherthing' \"something//\\anotherthing' NO
\"something\anotherthing' \"something//\\anotherthing' NO
\\"something\/\/\\\\anotherthing' \"something//\\anotherthing' NO
To fix the issue with ~
comparison I need to double escape the escapes:
patt="${pattern//\\/\\\\}" awk '{print $0, ENVIRON["patt"], ($0 ~ ENVIRON["patt"]?"YES":"NO") }' OFS="\t" file
\"something//\\anotherthing' \\"something//\\\\anotherthing' YES
\"something//\\anotherthing \\"something//\\\\anotherthing' NO
\"something/\anotherthing' \\"something//\\\\anotherthing' NO
\"something\anotherthing' \\"something//\\\\anotherthing' NO
\\"something\/\/\\\\anotherthing' \\"something//\\\\anotherthing' NO
Note the double escapes in result of printing ENVIRON["patt"]
in second column.
Question:
Where does escape sequence in awk happening when using tilde ~
comparison operator? on $0
(or $1
, $2
, …) or in ENVIRON["variable"]
?
Best Answer
The
~
operator does pattern matching, treating the right hand operand as an (extended) regular expression, and the left hand one as a string. POSIX says:So
ENVIRON["patt"]
is treated as a regular expression, and needs to have all characters that are special in EREs to be escaped, if you don't want them to be have their regular ERE meanings.Note that it's not about using
$0
orENVIRON["name"]
, but the left and right sides of the tilde. This would take the input lines (in$0
) as the regular expression to match against: