I want to remove the newline between two html tags which exists as follows:
<font>
</font>
I want to remove the newline character such that it becomes:
<font></font>
Also, there might be cases where there are more than one newline(s):
<font>
</font>
That also I want to remove to make it look like:
<font></font>
One more scenario,
if the pattern is like:
<font>
This is a text
</font>
After, conversion it should become:
<font>This is a text</font>
All the above scenarios are resolved, if we just truncate only the newline between two html tags. We should not be considering any white spaces.
There are couple of ways I have found it using sed, but it is very time consuming and very very efficient performance wise, particularly if the file has 1000+ html tags.
Best Answer
This
sed
command should help you:The command is looking for
<font>
tag that is followed by whitespace up to the end of line. Then the next line is pulled into the pattern space; then the replacement of a possibly existing sequence<font>[[:space:]]\+</font>
is performed and script restarts from the beginning. If the pattern space does not match the address/<font>[[:space:]]*$/
, i.e. some non-space content is present after a<font>
tag, then the pattern space is printed out and cleared by theend of sed script
and the process restarts.Edit: Performance measurement.
I filled a file with the following content repeated 10k times:
totally, 620Kb. The timings of the script above on 1.4Gz A8-4500M are:
Edit2:
Your last question update is much easier solved by
perl
and performance is 10 times better, as showed the other answer:Credits to @spasic