I am trying to write a sed command to substitute excessive spaces in a file. Each word should have only one space between words, but leading spaces and tabs should be left alone. So the file:
This is an indented paragraph. The indentation should not be changed.
This is the second line of the paragraph.
Will become:
This is an indented paragraph. The indentation should not be changed.
This is the second line of the paragraph.
I have tried variations of
/^[ \t]*/!s/[ \t]+/ /g
Any ideas would be appreciated.
Best Answer
The expression I used matches one or several
[[:blank:]]
(spaces or tabs) after a word, and replaces these with a single space. The\>
matches the zero-width boundary between a word-character and a non-word-character.This was tested with OpenBSD's native
sed
, but I think it should work with GNUsed
as well. GNUsed
also uses\b
for matching word boundaries.You could also use
sed -E
to shorten this toAgain, if
\>
doesn't work for you with GNUsed
, use\b
instead.Note that although the above sorts out your example text in the correct way, it does not quite work for removing spaces after punctuation, as after the first sentence in
For that, a slightly more complicated variant would do the trick:
This replaces any non-blank character followed by one or more blank characters with the non-blank character and a single space.
Or, using standard
sed
(and a very tiny optimization in that it will only do the substitution if there are two or more spaces/tabs after the non-space/tab),