I need to convert about 500k emails into searchable PDFs. By 'searchable' I mean that macOS will be able to scan them for specific words rather than simply treating them as an image. My searches, thus far, for a tool to do this have ended in proprietary database apps and over-priced sketchball x-to-pdf converters which basically perform the built-in macOS functionality of Print To PDF. Is there a single tool or two complementary tools that could be used together in Terminal to just batch convert all the emails to searchable PDFs?
Command Line Tool to Batch Convert .EML/.EMLX/.MBOX to Searchable PDFs
emailfile conversionocrpdf
Related Question
- Command line tool to convert DOC and DOCX files to PDF
- Any OSX tool to batch remove (or convert) embedded album artwork in MP3 files
- How to combine multiple PDFs using the command line
- MacOS – simple way to convert a Unicode text file to PDF on the command line on macOS
- How to convert an HTML file with referenced images on the command line to a webarchive
- How to combine multiple groups of PDFs using the command line
- How to convert an epub file to pdf from the command line
Best Answer
I had to do this with ~180 emails, and I used a command tool I found on GitHub that converts .eml to .pdf via .html: https://github.com/nickrussler/eml-to-pdf-converter
It takes a little while to convert each .eml file - 22 minutes for 186 emails with lots of images - so it's probably not helpful for a 500k email task. (Maybe if you're reeeally not in a rush and not afraid of multiprocessing!) If it is helpful for you or anyone else, though, here's how I got it to work in the bash command line:
git clone
the repoInstall the
wkhtmltopdf
tool from binary (installing withpip
is insufficient) from here: https://wkhtmltopdf.org/downloads.htmlFrom within the cloned repo, generate the email converter .jar file:
./gradlew shadowJar
Run for loop to convert every file in the .mbox (or a directory of .eml):