Replace an image in a PDF using command line

command lineimagespdf

I need to process some PDF files. The task consists in exchange a given image file by another. My first problem is how to replace a PDF image from command line in a batch process. Next I'll try to address other problems like how to identify which is the image I need to replace (because the PDF files may have more than one image). But first I want to resolve the first problem: how to replace a image in a PDF by another.

I've read about poppler-utils and pdftk but as far as I Know, none of these tools allow to replace images into PDF.

Best Answer

OK ... I think pdflatex is the missing piece here.

The OP said he has looked into poppler-utils and pdftk. Let me add to that pdfimages. These, together with pdflatex are the pieces of a solution.

pdfimages -f 4 -l 20 -j -png target.pdf imageroot

In the example code above, pdfimages looks through pages 4 through 20 of target.pdf and extracts all images into files with names beginning imageroot.

poppler-utils provides pdftotext. I recommend the -layout option which does a great job keeping the document human readable.

pdftotext -layout $1.pdf $1.txt

The OP's objection to the imagemagick solution offered by pidosaurus is that an image does not have extractable text. With the utilities I outlined, the OP will now have all the images as well as all the extracted text, and page numbers and contents are retained by the -layout option. The OP could identify the correct page of text and chuck it into a .tex file which ends with an %includegraphics directive and refers to the replacement picture by filename. You then pdflatex this and end up with a new single-page .pdf to insert into the rest of your document with pdftk. If you knew where in the text of the original page the image resided, you can %includegraphics [h] and get the image in exactly the right place.

Related Solutions

Command Line – Best PDF Viewer for Command Line Only

Not a real viewer, but as first aid a converter may also help:

pdftotext file.pdf - | less

pdftohtml -stdout -i file.pdf | lynx -stdin

pdftotext and pdftohtml are part of the Poppler package.

Combine multiple PDF files into one (arranged in a matrix)

You could use the utility program pdfnup from the pdfjam suite.

pdfnup in.pdf --nup 3x3

should output the file in-nup.pdf with the pages of in.pdf arranged in a series of pages with a 3x3 matrix from the origin pdf.

You should merge all of you pdf files in an only one, also you must want to specify a paper size for the output file, see the pdfjam docs fot the details.

Best Answer

Related Solutions

Command Line – Best PDF Viewer for Command Line Only

Combine multiple PDF files into one (arranged in a matrix)

Related Question