Extract graphs from PDFs

image manipulationpdf

I have a situation where I need to extract images from lots of PDF files and display them on a website. My PDFs have "regular" images as well as lots of graphs.

I used pdf2xml and it pulls out the images in jpeg, ppm, pbm and vec formats. I see the "regular" images being extracted (for the most part) as jpeg/ppm/pbm, but I don't see the graphs being there – and so I am guessing that pdf2xml is storing them as .vec files.

So the question is how to I get my graphs? I used convert that comes with imagemagick to convert .vec to jpeg/png etc but to no avail.

Best Answer

I've never tried pdf2xml, but browsing through its files on SourceForge, I found vec2svg-2.py, which appears to be a Python script to convert .vec files to .svg. You should have no difficulty converting SVG to whatever format you need.

python vec2svg-2.py -i file.vec -o file.svg
Related Question