I want to use sed
to replace anything in a string
between the first AB
and the first occurrence of AC
(inclusive)
with XXX
.
For example, I have this string (this string is for a test only):
ssABteAstACABnnACss
and I would like output similar to this: ssXXXABnnACss
.
I did this with perl
:
$ echo 'ssABteAstACABnnACss' | perl -pe 's/AB.*?AC/XXX/'
ssXXXABnnACss
but I want to implement it with sed
.
The following (using the Perl-compatible regex) does not work:
$ echo 'ssABteAstACABnnACss' | sed -re 's/AB.*?AC/XXX/'
ssXXXss
Best Answer
Sed regexes match the longest match. Sed has no equivalent of non-greedy.
What we want to do is match
AB
,followed by
AC
,followed by
AC
Unfortunately,
sed
can’t do #2 — at least not for a multi-character regular expression. Of course, for a single-character regular expression such as@
(or even[123]
), we can do[^@]*
or[^123]*
. And so we can work around sed’s limitations by changing all occurrences ofAC
to@
and then searching forAB
,followed by
@
,followed by
@
like this:
The last part changes unmatched instances of
@
back toAC
.But this is a reckless approach because the input could already contain
@
characters. So, by matching them, we could get false positives. However, since no shell variable will ever have a NUL (\x00
) character in it, NUL is likely a good character to use in the above work-around instead of@
:The use of NUL requires GNU sed. (To make sure that GNU features are enabled, the user must not have set the shell variable POSIXLY_CORRECT.)
If you are using sed with GNU's
-z
flag to handle NUL-separated input, such as the output offind ... -print0
, then NUL will not be in the pattern space and NUL is a good choice for the substitution here.Although NUL cannot be in a bash variable it is possible to include it in a
printf
command. If your input string can contain any character at all, including NUL, then see Stéphane Chazelas' answer which adds a clever escaping method.