Pdf – How to split PDF file into layers

adobe-readerghostscriptpdfvector graphics

I have a large PDF file containing a map. The PDF file was probably generated with AutoCAD.

The image consists of a coloured raster map, and a vector with lines on top of the map. (Street lines etc.)

I need to work with the raster and the vector separately. When I import it into photoshop, it only sees one layer. When I select the layers tab on Adobe PDF Reader, it also shows only one layer. But I am sure there are multiple layers, because when it renders the file, it first draws out the map in the background, and only after starts drawing the vector on top. If I am fast enough, I can actually use "print screen" to save the background raster. I need a more reliable method to extract that image, and also the vector.

Can I use some opensource tool like ghostscript to split up the pdf into its essential parts like text, raster, vector data? And them put them all in a folder?

Best Answer

I've found one manual solution using Inkscape, am looking around for ways to automate it.

  1. Open the PDF in Inkscape (I too had a map like yours). Go with the default import settings.
  2. Menu > Object > Objects . (and not Layers)
  3. It opens an objects panel. This is just like layers. We can click on the left columns to toggle visibility, lock it, etc.
  4. There's one item there, but it has an arrow indicating there might be more. I click that, and it expands to show several sub-items.
  5. As I click on each one, on the image the different objects get selected. On toggling visibility (closing the eye), each object disappears from the image.
  6. Thus after hiding all the stuff I didn't want, I go to File > Export PNG image. I had to increase the size and DPI to get it to a good resolution.. the default setting have a small thumbnail.
  7. I now have the map I needed.

Automation

I found a command line way of doing this.

inkscape -z -i g2846 -j -D -d 300 test3.pdf -e 3.png

Reference doc: https://inkscape.org/sk/doc/inkscape-man.html

Explaining the parameters:

  • -z : no gui, run inkscape in command line only
  • -i g2846 : Selecting the specific group/layer id to export. I got to know this id/label by the above mentioned manual steps in the Inkscape gui.
  • -j : hide all other layers etc in the export
  • -D : Keep the export image's dimension same as the whole drawing/doc, and maintain the extracted object's position. (this is important in the event that the original object is rotated/warped and you want the output no the original, or if you're exracting multiple layers and need to maintain their positions on the canvas)
  • -d 300 : 300 DPI : the default made the output png too lossy, this setting kept it all good at my end.
  • test3.pdf : my input pdf
  • -e 3.png : export as PNG, and filename given.

Unfortunately we can only extract one object/layer at a time for now. There is a bug filed for inkscape requesting to allow multiple layers : Allow several -i (--export-id=ID) options.

[EDIT] Another workaround if you want multiple (but not all) layers: Use the inkscape command shared above to get individual layers out as: 1.png, 2.png, 3.png Then, run the following command from imagemagick:

$ convert -page +0+0 1.png \
-page +0+0 2.png \
-page +0+0 3.png \
-layers merge +repage merged.png

That should merge the layers together to merged.png.

Related Question