How to remove newline character between two strings \n in unix

htmlnewlinessedtext processing

I want to remove the newline between two html tags which exists as follows:

<font>
</font>

I want to remove the newline character such that it becomes:

<font></font>

Also, there might be cases where there are more than one newline(s):

<font>

</font>

That also I want to remove to make it look like:

<font></font>

One more scenario,

if the pattern is like:

<font>
This is a text
</font>

After, conversion it should become:

<font>This is a text</font>

All the above scenarios are resolved, if we just truncate only the newline between two html tags. We should not be considering any white spaces.

There are couple of ways I have found it using sed, but it is very time consuming and very very efficient performance wise, particularly if the file has 1000+ html tags.

Best Answer

This sed command should help you:

sed -e ':1;/<font>[[:space:]]*$/{N;s#<font>[[:space:]]\+</font>#<font></font>#g;b1}' file

The command is looking for  tag that is followed by whitespace up to the end of line. Then the next line is pulled into the pattern space; then the replacement of a possibly existing sequence [[:space:]]\+ is performed and script restarts from the beginning. If the pattern space does not match the address /[[:space:]]*$/, i.e. some non-space content is present after a  tag, then the pattern space is printed out and cleared by the end of sed script and the process restarts.

Edit: Performance measurement.

I filled a file with the following content repeated 10k times:

<font>
dejidewji
</font>
<font>



</font><font>





</font>

totally, 620Kb. The timings of the script above on 1.4Gz A8-4500M are:

real    0m0.361s
user    0m0.356s
sys 0m0.005s

Edit2:

Your last question update is much easier solved by perl and performance is 10 times better, as showed the other answer:

perl -0777 -pe 's|<font>\s+|<font>|g;s|\s+</font>|</font>|g' file

Credits to @spasic

Related Solutions

Why does this awk command not play as well with find as sed does

First of all, you need to end the -exec action with {} \;.

Second, awk do not modify the file in place as sed do (with the -i option), so you should send the output to a temporary file, then move this to the original file.

Create a script (say we call it replace) with the following content:

#!/bin/sh
tfile=$(mktemp)
awk '/<!-- STARTREPLACE1 -->/{p=1;print;print "A whole new world!"} 
     /<!-- ENDREPLACE1 -->/  {p=0}' "$1" >"$tfile" && \
  mv "$tfile" "$1"

give it executable permissions

chmod +x ./replace

then run

find "$DIR" -type f -iname '*.html' -exec ./replace {} \;

Sed Newlines – Fix Sed Failing to Remove Newline Character

sed delimits on \newlines - they are always removed on input and reinserted on output. There is never a \newline character in a sed pattern space which did not occur as a result of an edit you have made. Note: with the exception of GNU sed's -z mode...

Just use tr:

echo ls | tr -d \\n | xclip -selection clipboard

Or, better yet, forget sed altogether:

printf ls | xclip -selection clipboard

Best Answer

Related Solutions

Why does this awk command not play as well with find as sed does

Sed Newlines – Fix Sed Failing to Remove Newline Character

Related Question