Why does this add spaces? echo “x ax” | sed ‘s/x\s*/x /’

regular expressionreplacesed

I want to find a x, and replace the 0 or more following spaces (\s*) with just a single space.

echo "x ax" | sed 's/x\s*/x /'

For some reason, instead of replacing the spaces with the single space, it just appends one space to however many existed there before:

x  ax

The use of + instead of * appears to absolutely nothing, regardless of my use of the -E flag.

It appears that sed doesn't do non-greedy expressions, so why doesn't this * consume all of the spaces when matching?

I'm a regex ninja in non-bash settings, but bash and its tools eat me alive. I've got no idea how to concisely phrase this for a successful search engine query.

Best Answer

sed expects a basic regular expression (BRE). \s is not a standard special construct in a BRE (nor in an ERE, for that matter), this is an extension of some languages, in particular Perl (which many others imitate). In sed, depending on the implementation, \s either stands for the literal string \s or for the literal character s.

In your implementation, it appears that \s matches s, so \s* matches 0 or more s, and x\s* matches x in your sample input, hence x ax is transformed to x ax (and xy would be transformed to x y and so on). In other implementations (e.g. with GNU sed), \s matches \s, so \s* matches a backslash followed by 0 or more s, which doesn't occur in your input so the line is unchanged.

This has absolutely nothing to do with greediness. Greediness doesn't influence whether a string matches a regex, only what portion of the string is captured by a match.

Related Question