Are there more robust tools than Automator to extract text from multiple PDF

applescriptautomatorpdf

There is an action in Automator that allows you to programmatically "Extract PDF Text", but it fails when fed a moderate amount of files (25 to 100). Worse, it fails without logging anything helpful except for a message that "Automator Quit Unexpectedly".

Does anyone know of an equivalent command for doing this in Applescript? I am looking for tools where I have more control over things like logging and error handling so I can be more efficient in processing PDF files into a text format.

Best Answer

I don't know how it compares against other options, but you could use pdfotext. It can be installed with brew install xpdf.

do shell script "/usr/local/bin/pdftotext /usr/share/doc/bash/bash.pdf -" without altering line endings

Calibre also comes with some command line utilities:

/Applications/calibre.app/Contents/MacOS/ebook-convert /usr/share/doc/bash/bash.pdf /tmp/output.txt

Related questions: