How to remove line breaks (or carriage returns) only from certain parts of a block of text

carriage returnlinebreaks

Whenever I copy formatted text from a PDF file which is formatted to have line breaks (or carriage returns), I need to find a way to remove these line breaks without removing the paragraph format.

To do this I need to use RegEx (Regular expressions) to only remove the line breaks which aren't preceded by a period.

So for example, if a string of text has a line break right after a period, that is obviously almost always a legitimate line break which will start a new paragraph. If a string of text has a line break mid-word or after a word with no period, it's simply part of the bad formatting I need to get rid of.

My problem is that I don't know how to use RegEx to make it only remove the ^p tags in word or CRLF or line breaks in any format under the conditions that it omits ones following a period.

Best Answer

Solution for MS Word:

  1. Open Find & Replace (Ctrl+H) and check the "Use wildcards" option. If you don't see the "Use wildcards" option, click "More".
  2. Copy the following into the "Find What" box: ([!.])^0013
  3. Copy the following into the "Replace What" box: \1
  4. Click "Replace All"

Explanation:

  • [!.] means "find every symbol except dot"
  • ^0013 is a paragraph mark, so in the "Find What" we will find every non-dot symbol followed by a paragraph mark
  • Parentheses mean that we will place that non-dot symbol in memory to use later
  • \1 replaces our memorized symbol at the location where we find it

Note that the ^0013 is not inside the parentheses, so the final text would be without paragraph marks.

Related Question