Why does this add spaces? echo “x ax” | sed ‘s/x\s*/x /’

regular expressionreplacesed

I want to find a x, and replace the 0 or more following spaces (\s*) with just a single space.

echo "x ax" | sed 's/x\s*/x /'

For some reason, instead of replacing the spaces with the single space, it just appends one space to however many existed there before:

x  ax

The use of + instead of * appears to absolutely nothing, regardless of my use of the -E flag.

It appears that sed doesn't do non-greedy expressions, so why doesn't this * consume all of the spaces when matching?

I'm a regex ninja in non-bash settings, but bash and its tools eat me alive. I've got no idea how to concisely phrase this for a successful search engine query.

Best Answer

sed expects a basic regular expression (BRE). \s is not a standard special construct in a BRE (nor in an ERE, for that matter), this is an extension of some languages, in particular Perl (which many others imitate). In sed, depending on the implementation, \s either stands for the literal string \s or for the literal character s.

In your implementation, it appears that \s matches s, so \s* matches 0 or more s, and x\s* matches x in your sample input, hence x ax is transformed to x ax (and xy would be transformed to x y and so on). In other implementations (e.g. with GNU sed), \s matches \s, so \s* matches a backslash followed by 0 or more s, which doesn't occur in your input so the line is unchanged.

This has absolutely nothing to do with greediness. Greediness doesn't influence whether a string matches a regex, only what portion of the string is captured by a match.

Related Solutions

Why isn’t this sed regex matching

The 1 in the number 10 matches [^049] so it's deleted.

Why isn’t sed greedy in this simple case

The .* is greedy first -- it's matching foo 6. The only reason it stops there is because matching any further would stop the whole pattern from matching, so it leaves the 5 for the ([0-9]+). If you made it ([0-9]*) instead the .* would match the whole line and you'd get nothing in your group. One way around it is to tell the first part not to match numbers:

$ echo "foo 65 bar" | sed -n -e 's/[^0-9]*\([0-9]\+\).*/\1/p'
65

Best Answer

Related Solutions

Why isn’t this sed regex matching

Why isn’t sed greedy in this simple case

Related Question