Why can’t a constant regular expression be put on the left side of an ~ operator in gawk

gawk

Why can't I put a regular expression on the left side of the ~ operator when using gawk?

For example, given the file below with fields delimited with tabs(\t):

$ cat cats
siberian    1970    73  2500
shorthair   1999    60  3000
longhair    1998    102 9859
scottish    2001    30  6000

If I use gawk to find a record, it works:

$ gawk '$1 ~ /h/' cats
shorthair   1999    60  3000
longhair    1998    102 9859
scottish    2001    30  6000

However if I move the operands $1 and /h/ around, it doesn't:

$ gawk '/h/ ~ $1' cats
gawk: cmd. line:1: warning: regular expression on left of `~' or `!~' operator

The gawk man page for the ~ operator says:

Regular expression match, negated match. NOTE: Do not use a constant
regular expression (/foo/) on the left-hand side of a ~ or !~. Only
use one on the right-hand side. The expression /foo/ ~ exp has the
same meaning as (($0 ~ /foo/) ~ exp). This is usually not what was
intended.

I don't understand how the expression /foo/ is evaluated to become ($0 ~ /foo/) and also this seems to only imply the weaker phrase "bad things will happen if you put a constant regular expression on the left" it doesn't actually imply the stronger phrase of "the behaviour of gawk is undefined if you put a constant regular expression on the left because it wasn't programmed to be used in this way".

I basically don't understand how the operator ~ is evaluated internally.

Best Answer

To quote the POSIX spec for awk:

When an ERE token appears as an expression in any context other than as the right-hand of the ~ or !~ operator or as one of the built-in function arguments described below, the value of the resulting expression shall be the equivalent of:

$0 ~ /ere/

This (combined with the action defaulting to { print }) is why you can use awk as a grep substitute by just doing awk '/b/' <file.

So, the answer is just "it's defined to work that way". /ere/ is defined to mean $0 ~ /ere/ except in certain circumstances, and /ere/ ~ $1 is not one of the exceptional circumstances, so it gets evaluated as ($0 ~ /ere/) ~ $1.

Related Question