AWK Regular Expressions – Reduce Greediness

awkregular expression

I want to do non-greedy pattern (regular expression) matching in awk
Here is an example:

echo "@article{gjn, Author =   {Grzegorz J. Nalepa}, " | awk '{ sub(/@.*,/,""); print }'

Is it possible to write a regular expression that selects the shorter string?

@article{gjn,

instead of this long string?:

@article{gjn, Author =   {Grzegorz J. Nalepa},

I want to get this result:

 Author =   {Grzegorz J. Nalepa},


I have another example:

echo ",article{gjn, Author =   {Grzegorz J. Nalepa}, " | awk '{ sub(/,[^,]*,/,""); print }'
      ↑                                                              ↑^^^^^

Note that I changed the @ characters to comma (,) characters
in the first position of both the input string and the regular expression
(and also changed .* to [^,]*). 
Is it possible to write a regular expression that selects the shorter string?

, Author =   {Grzegorz J. Nalepa},

instead of the longer string?:

,article{gjn, Author =   {Grzegorz J. Nalepa},

I want to get this result:

,article{gjn

Best Answer

If you want to select @ and up to the first , after that, you need to specify it as @[^,]*,

That is @ followed by any number (*) of non-commas ([^,]) followed by a comma (,).

That approach works as the equivalent of @.*?,, but not for things like @.*?string, that is where what's after is more than a single character. Negating a character is easy, but negating strings in regexps is a lot more difficult.

A different approach is to pre-process your input to replace or prepend the string with a character that otherwise doesn't occur in your input:

gsub(/string/, "\1&") # pre-process
gsub(/@[^\1]*\1string/, "")
gsub(/\1/, "") # revert the pre-processing

If you can't guarantee that the input won't contain your replacement character (\1 above), one approach is to use an escaping mechanism:

gsub(/\1/, "\1\3") # use \1 as the escape character and escape itself as \1\3
                   # in case it's present in the input
gsub(/\2/, "\1\4") # use \2 as our maker character and escape it
                   # as \1\4 in case it's present in the input
gsub(/string/, "\2&") # mark the "string" occurrences

gsub(/@[^\2]*\2string/, "")

# then roll back the marking and escaping
gsub(/\2/, "")
gsub(/\1\4/, "\2")
gsub(/\1\3/, "\1")

That works for fixed strings but not for arbitrary regexps like for the equivalent of @.*?foo.bar.

Related Question