Ubuntu – better pdf to text converter than pdftotext

conversionpdf

I'm using pdftotext (part of poppler-utils) to convert PDF documents to text. It works, for the most part, but one thing I wish it did was to insert blank lines between separate paragraphs instead of mashing them together.

Is there way to get pdftotext to do this? And if not, is there another pdf to text utility that can do this?

Best Answer

You could try ebook-convert from Calibre.

If anything, I'd say it errs in the other direction: too many line breaks.

Another thing I'd definitely consider though is converting to HTML using pdfreflow, and then convert the HTML to TXT.