Problem with regular expression in gawk (‘\<' not working)

gawkregular expression

I am trying to get into bash a little bit. I was going through this guide but the second example in this subsection on gawk doesn't seem to work.

The problem is this command:

ls -l | awk '/\<(a|x).*\.conf$/ { print $9 }'

It works only if I replace \< with a space. I also tried using \y, but no luck there either.

Does anyone have any idea what the problem might be here?

Thanks 🙂

Best Answer

The GNU awk manual (sec. 3.5) documents that the regex \< is gawk-specific and thus one should not expect it to work in other implementations.

According to man mawk, if you place a backslash in front of a nonspecial character, then the backslash is removed. Thus, under mawk, \< is interpreted simply as an angle bracket character.

Examples

I simplified the regex to provide examples of the different behavior:

$ echo -e " a\n ab.conf\n <ac.conf" | gawk '/\<(a|x)/ { print}'
 a
 ab.conf
 <ac.conf
$ echo -e " a\n ab.conf\n <ac.conf" | mawk '/\<(a|x)/ { print}'
 <ac.conf

Again, gawk interprets \< as the beginning of a word while mawk interprets it simply as an angle bracket.

What does POSIX say about this issue

The GNU awk manual explains:

If you place a backslash in a string constant before something that is not one of the characters previously listed, POSIX awk purposely leaves what happens as undefined.

In other words, in this case, the different awk interpreters are free to make their own decisions.

Related Question