Whenever I copy formatted text from a PDF file which is formatted to have line breaks (or carriage returns), I need to find a way to remove these line breaks without removing the paragraph format.
To do this I need to use RegEx (Regular expressions) to only remove the line breaks which aren't preceded by a period.
So for example, if a string of text has a line break right after a period, that is obviously almost always a legitimate line break which will start a new paragraph. If a string of text has a line break mid-word or after a word with no period, it's simply part of the bad formatting I need to get rid of.
My problem is that I don't know how to use RegEx to make it only remove the ^p tags in word or CRLF or line breaks in any format under the conditions that it omits ones following a period.
Best Answer
Solution for MS Word:
([!.])^0013
\1
Explanation:
[!.]
means "find every symbol except dot"^0013
is a paragraph mark, so in the "Find What" we will find every non-dot symbol followed by a paragraph markNote that the
^0013
is not inside the parentheses, so the final text would be without paragraph marks.