Replace Single Newlines – Better Methods Using Sed and Awk

awkregular expressionsedtext processing

I am in the habit of writing one line per sentence because I typically compile things to LaTeX, or am writing in some other format where line breaks get ignored. I use a blank line to indicate the start of a new paragraph.

Now, I have a file written in this style which I'd like to just send as plain text. I want to remove all the single linebreaks but leave the double linebreaks intact. This is what I've done:

sed 's/^$/NEWLINE/' file.txt | awk '{printf "%s ",$0}' | sed 's/NEWLINE/\n\n/g' > linebreakfile.txt

This replaces empty lines with some text I am confident doesn't appear in the file: NEWLINE and then it gets rid of all the line breaks with awk (I found that trick on some website) and then it replaces the NEWLINEs with the requisite two linebreaks.

This seems like a long winded way to do a pretty simple thing. Is there a simpler way? Also, if there were a way to replace multiple spaces (which sometimes creep in for some reason) with single spaces, that would be good too.

I use emacs, so if there's some emacs specific trick that's good, but I'd rather see a pure sed or pure awk version.

Best Answer

You can use awk like this:

$ awk ' /^$/ { print; } /./ { printf("%s ", $0); } ' test

Or if you need an extra newline at the end:

$ awk ' /^$/ { print; } /./ { printf("%s ", $0); } END { print ""; } ' test

Or if you want to separate the paragraphs by a newline:

$ awk ' /^$/ { print "\n"; } /./ { printf("%s ", $0); } END { print ""; } ' test

These awk commands make use of actions that are guarded by patterns:

/regex/

or

END

A following action is only executed if the pattern matches the current line.

And the ^$. characters have special meaning in regular expressions, where ^ matches the beginning of line, $ the end and . an arbitrary character.

Related Question