Extracting background images from a PDF file

I have a PDF file containing maps of the building I work in, here:

http://www.libsys.und.edu/dev/FloorPlans_All.pdf

The original source files have been lost, and I've been asked to extract the map images, preferably without the text and icons that have been overlaid on top of them. This has proven annoyingly difficult.

So far, I have tried the following GUI programs:

Adobe Reader: lets me select text, but not the background images
FoxIt PDF Viewer: lets me select text, but not the background images
XPDF on Ubuntu 10.10: lets mes select text, but not the background images

And also the following command-line programs:

pdfimages: extracts the icons indicating bathrooms just fine, but not the background images
pdftohtml: same as pdfimages, plus it makes a poorly marked up HTML document
pdfextract: same as pdfimages
convert: successfully saved images, but with the text burned into them

I've even tried opening the PDF manually in a text editor and extracting the stream objects by pasting them into a new file and saving it with a .jpg, .png, or .bmp extension (each in turn). Considering how little I know about the internal structure of PDF files, it's no surprise that this didn't work.

So … is there any way I can retrieve the map images from this thing without also getting the text and icons?

Best Answer

You can download the XPDF library from http://www.foolabs.com/xpdf/download.html for Linux and Windows. Then run pdfimages -j input.pdf output and you should get output-000.jpg, output-001.jpg, etc. Also, check out http://linuxcommand.org/man_pages/pdfimages1.html for more usage options.

Best Answer

Related Solutions

How to identify the format of images in a pdf

Pdf – Copy Image from PDF

Related Question