Non-Greedy Match with SED Regex – Emulating Perl’s .*?

regular expressionsedtext processing

I want to use sed to replace anything in a string
between the first AB and the first occurrence of AC (inclusive)
with XXX.

For example, I have this string (this string is for a test only):

ssABteAstACABnnACss

and I would like output similar to this: ssXXXABnnACss.


I did this with perl:

$ echo 'ssABteAstACABnnACss' | perl -pe 's/AB.*?AC/XXX/'
ssXXXABnnACss

but I want to implement it with sed.
The following (using the Perl-compatible regex) does not work:

$ echo 'ssABteAstACABnnACss' | sed -re 's/AB.*?AC/XXX/'
ssXXXss

Best Answer

Sed regexes match the longest match. Sed has no equivalent of non-greedy.

What we want to do is match

  1. AB,
    followed by
  2. any amount of anything other than AC,
    followed by
  3. AC

Unfortunately, sed can’t do #2 — at least not for a multi-character regular expression.  Of course, for a single-character regular expression such as @ (or even [123]), we can do [^@]* or [^123]*.  And so we can work around sed’s limitations by changing all occurrences of AC to @ and then searching for

  1. AB,
    followed by
  2. any number of anything other than @,
    followed by
  3. @

like this:

sed 's/AC/@/g; s/AB[^@]*@/XXX/; s/@/AC/g'

The last part changes unmatched instances of @ back to AC.

But this is a reckless approach because the input could already contain @ characters. So, by matching them, we could get false positives.  However, since no shell variable will ever have a NUL (\x00) character in it, NUL is likely a good character to use in the above work-around instead of @:

$ echo 'ssABteAstACABnnACss' | sed 's/AC/\x00/g; s/AB[^\x00]*\x00/XXX/; s/\x00/AC/g'
ssXXXABnnACss

The use of NUL requires GNU sed. (To make sure that GNU features are enabled, the user must not have set the shell variable POSIXLY_CORRECT.)

If you are using sed with GNU's -z flag to handle NUL-separated input, such as the output of find ... -print0, then NUL will not be in the pattern space and NUL is a good choice for the substitution here.

Although NUL cannot be in a bash variable it is possible to include it in a printf command. If your input string can contain any character at all, including NUL, then see Stéphane Chazelas' answer which adds a clever escaping method.

Related Question