I want to do non-greedy pattern (regular expression) matching in awk
.
Here is an example:
echo "@article{gjn, Author = {Grzegorz J. Nalepa}, " | awk '{ sub(/@.*,/,""); print }'
Is it possible to write a regular expression that selects the shorter string?
@article{gjn,
instead of this long string?:
@article{gjn, Author = {Grzegorz J. Nalepa},
I want to get this result:
Author = {Grzegorz J. Nalepa},
I have another example:
echo ",article{gjn, Author = {Grzegorz J. Nalepa}, " | awk '{ sub(/,[^,]*,/,""); print }' ↑ ↑^^^^^
Note that I changed the @
characters to comma (,
) characters
in the first position of both the input string and the regular expression
(and also changed .*
to [^,]*
).
Is it possible to write a regular expression that selects the shorter string?
, Author = {Grzegorz J. Nalepa},
instead of the longer string?:
,article{gjn, Author = {Grzegorz J. Nalepa},
I want to get this result:
,article{gjn
Best Answer
If you want to select
@
and up to the first,
after that, you need to specify it as@[^,]*,
That is
@
followed by any number (*
) of non-commas ([^,]
) followed by a comma (,
).That approach works as the equivalent of
@.*?,
, but not for things like@.*?string
, that is where what's after is more than a single character. Negating a character is easy, but negating strings in regexps is a lot more difficult.A different approach is to pre-process your input to replace or prepend the
string
with a character that otherwise doesn't occur in your input:If you can't guarantee that the input won't contain your replacement character (
\1
above), one approach is to use an escaping mechanism:That works for fixed
string
s but not for arbitrary regexps like for the equivalent of@.*?foo.bar
.