Convert PDF with embedded fonts to EMF for PowerPoint

conversionemffontsmicrosoft-powerpointpdf

Is there a free (i.e. gratis) way to convert a PDF file to Windows EMF (Enhanced Meta File) in such a way that text which uses fonts embedded in the PDF will be rendered the same way in MS Office PowerPoint? I guess one would have to replace the text with a filled path, but that would be all right since I only want to show the result, not edit it.

I tried pstoedit, but the font embedding seems to be tricky. Looking at the manual on font handling it seems as if -dt should turn text into filled paths, but in this case the paths are apparently really just polygons, connecting segment end points but not doing any Bézier curves in between. So the result looks strange, e.g. with diamonds as the dots of all the ‘i’.

I've read in several places (e.g. here) that inkscape could be used to convert PDF to EMF. But on Windows the PDF import hangs without showing a dialog. On Linux, I get an import dialog but the only option for text handling is to leave text as text; I can't convert to paths so without the embedded fonts I'm forced to use system fonts instead.

I've also tried ImageMagick convert, but that seems to rasterize the image so the result looks blurry.

For one application, namely embedding LaTeX formulas into PowerPoint, this post suggests alternatives (at least some of which work via DVI instead of PDF, and MHTML instead of EMF. But there are many more tools which can create PDF but not EMF, so the general problem remains.

I have access to Windows, Linux and OS X, so a suggested answer may use any combination of OS if that helps. If you don't have a complete solution, then a partial solution may still help. E.g. some PDF-to-PDF converter which replaces text with filled paths. Or some tool to extract fonts from PDF and save them in separate files, where other tools (like pstoedit or inkscape) might pick them up and use them to render the texts. Or anything else you consider a significant step towards a solution.

Best Answer

I had been facing the same problem as you: I had a number of .pdf files (two pages each) that I wanted to transform into something that I could import into a Word file; something just happened to be an .emf in the end (all other formats were not accepted).

This answer assumes you are comfortable in using the console.

The tool of choice to convert vector format X into vector format Y seems to be inkscape. However, when importing a .pdf file directly into inkscape

  • you can only access the first page on console (to the best of my knowledge)
  • even if you select the text-to-paths option flag -T, the text is not well-transformed.

Therefore, I found it necessary to pre-convert the .pdf file into something inkscape is able to use. I found this answer very useful, especially the mention of pdf2svg. My final sequence was the following:

pdf2svg input_filename.pdf interim_filename_%d.svg all
inkscape -T interim_filename_1.svg --export-emf=interim_filename_1.emf
(repeat for all additional pages of the .pdf)

To the best of what I can see on screen, transferring those .emf files to a Windows machine and opening them using the Windows image viewer, the result is identical to the input. Plus, having tried a test case with a custom-made LaTeX document using a font not present on my Windows machine, I also found the result identical. Skipping the initial pdf2svg step meant that the spacing was completely messed up after inkscape’s conversion.

In my case, I did not need to remove the .pdf page boundaries (I was dealing with full-page files). You may require such an intermediate step if you are interested in only a small part of the .pdf page. Pulling from this answer, pdfcrop seems to be able to do that.

Related Question