Unix way to extract vectorised image and its graph from a PDF file

imagespdf

Data: one LHC thesis' page 16, where the picture is vectorised (most probably .eps).
I am reviewing the answer here of the thread Software needed to scrape data from graph.
I cannot find any tool that is made to extract .eps image from a PDF file.
My whole system's pseudocode

  1. Neutralise PDF file by gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=newfile.pdf badfile.pdf (source)
  2. Find native resolution for the extraction of the vectorised image from a pdf file. (not sure about this one because no zooming may be necessary; 100% zoom level of Adobe view cannot be optimal with a screenshot)
  3. extract vectorised image from a pdf file (current goal)
  4. extract graph from .eps image

where doing all in the same system would be great.

Open tools with (3)

Possible image formats png/xpm/jpeg/tiff/pnm/ras/bmp/gif

  • g3data but no .eps format
  • Engaude-digitizer is active here, and more popular than R digitize.
  • R digitize was removed from CRAN, because no maintainer power; but now in tpoisot's Github here and its review in Luke's blog Digitizing data from old plots using ‘digitize’ but they are trying to get back to CRAN here a ticket. I have experienced a sequence of problems with the software here. One big weakness is that they sensor their github, and no feedback is welcome.

Systems with (3) and (4)

  • most probably R package which can do both things:

Tools only with (3) or (4) or none

  • Task (4) can be done in Mathematica as described here about Is it possible to extract data from an eps plot not generated in Mathematica. However, Mathematica not suitable for Task (3) according to devtalk.
  • Adobe Acrobat > Editing. I could not find any suitable method to do it. It seems that no Linux version in Ubuntu 16.04.

From vectorized and Steps (1-2)

Drag-and-drop of the figure does not work here.
So must programmatically extract the figure from the pdf.
There exists a terminal tool for that which extract all images/eps/… from the document, but I have no idea how well they do what they do.
I would like to find here something which is just good in extracting the .eps image from a pdf file.

From Rasterized to Vectorized and Steps (1-2)

Example image for DavidLeBauer about the insersection of the graph with the x-axis for the discussion here

enter image description here

and second example about points intersecting two axes here for David

enter image description here

Code

% https://unix.stackexchange.com/q/281211/16920
gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=data_clean.pdf badfile.pdf

% drag and drop picture from data_clean.pdf to your folder in Ubuntu 16.04 by having the default zoom level; I think zoom should not affect here the result of drag-and-drop
% Result: image.png

% g3data image.png
% bug in 16.04: http://askubuntu.com/q/767982/25388

% open figure in ubuntu - Print to File > Ps.
% Result: image.png.ps

ps2eps image.png.ps
% Result: image.png.eps

% https://mathematica.stackexchange.com/q/85320/9815
%% Mathematica starts here 

(* Wolfram Language Test file *)

fig = Import["image.png.eps"]

Import["http://raw.github.com/AlexeyPopkov/shortInputForm/master/shortInputForm.m"]

fig // shortInputForm

% Run but get error: http://askubuntu.com/q/767992/25388
% NB this error comes too if I have no code in the editor. So something wrong in my way of doing this. I am amateur in Mathematica. 

How can you extract .eps image and its graph from a pdf file in Unix way?

Best Answer

No sufficient supported solution exists for the case because the problem is hard inverse-problem in reality. Mathematica solutions have also significant problems with real-world applications.

Related Question