Word format change

ms office

I am looking at some Word files last edited by a former assistant. With "showing all non-printing characters" off, it looks ok other than some strange font changes. Turning on "showing all non-printing characters" shows formatting that makes it look right but it is clearly not the formatting typed in by a person. An example – The original document had line numbering every five lines. This version seems like it has line numbering but they are just characters. It has a section break just before each 5th line. To make a modest sized edit I am exporting it all as text and starting over putting in the formatting. Any idea how this could happen? Might this have come from OCRing a PDF? Any magic way to correct it?

Best Answer

Yes, that sort of formatting sounds like what can happen when your OCR software is set to interpret what it has scanned, when in fact you really just need the plain text for applying styles.

A long time ago on a Windows system, we had to recover an electronic document from the sole remaining printed copy that we had. The software had a marquee feature, where it would:

  1. Scan the page and show you the preview
  2. Allow you to draw rectangular marquees over the portions that you were interested in (this was a convenient tool for ignoring scan fragments)
  3. Produce text fields that contain the scanned text
  4. Provide a button that would copy the text to the Clipboard

The thing about copying to the clipboard, if you clicked the button, then some interpreting took place that introduced passable formatting, but you no longer had plain text.

However, in step (3), if you drag-selected the text within the paragraph field and manually copied to clipboard, then you'd get plain text, for which to was easier to assign styles.

With respect to Mac software, try a demo for the latest Adobe Acrobat, and see if it can do a decent job doing OCR on a screenshot/scan of those Word files.