How to identify the format of images in a pdf

imagemagickimagespdf

I have received a number of pdf files with images in them. The original images have been lost, so I need to extract them. I have Adobe Acrobat Pro, so I extracted them using Advanced > Document Processing > Export All Images (there are four options: jpeg, png, tiff, jpeg2000). But, I'd like to extract them in the original format, and this is apparently not jpeg: I also tested pdfimages.exe from xpdf as outlined here, and this gave .ppm files, not jpeg.

So I tried ImageMagick's identify, what it gave me was this:

identify images-000.ppm
images-000.ppm PPM 870x1181 870x1181+0+0 8-bit sRGB 3.082MB 0.000u 0:00.000

Does this indicate it was an embedded .bmp? How to tell? I would actually expect a function in Acrobat to identify the format of images, but I couldn't find it.

So, what is the best way to identify the image format of images in a pdf?

(I prefer extraction via Acrobat because of the batch functionality).

Best Answer

AFAIK, the Image XObjects embedded inside PDFs do not store any information about the original image format. At most if it's an embedded JPEG it can be extracted as-is, but for all other cases you end up with a PxM image that you'll need to convert.

Related Question