Sed: delete text between a string until first occurrence of another string

regexsed

Imagine I have something like the following text:

The quick brown fox jumps in 2012 and 2013

And I would wish to delete the part from "fox" including the four numbers but only in the first occurrence so I end up with:

The quick brown and 2013

Something likes this…:

echo "The quick brown fox jumps in 2012 and 2013" \
   | sed  "s/fox.*\([0-9]\{4\}\)//g"

…brings me:

The quick brown

So it removed everything including the last occurrence of the four numbers.

Any ideas?

Best Answer

POSIX regular expressions used by sed (both the "basic" and "extended" versions) do not support non-greedy matches. (Although there are some workarounds, such as using [^0-9]* in place of .*, they become unreliable if the inputs vary a lot.)

What you need can be achieved in Perl by using the ? non-greedy quantifier:

echo "The quick brown fox jumps in 2012 and 2013" \
   | perl -pe 's/fox.*?([0-9]{4})//g'

You might wish to remove an extra space as well.

Related Question