Ubuntu – Why are the images produced by pdfimages different when using the -all flag

command lineimage processingimagemagickpdf

It's my understanding that pdfimages -all extracts images from PDFs in their native formats.

Therefore, I expected that the JPG (lossy) images extracted from that command would have the same pixel information as the .ppm and .pbm files produced without the -all option, as well as the PNG (lossless) files created when I right-click and save the image in Evince.

However, my use of the ImageMagick compare command tells me that there are differences in the images contained within the JPG files compared to the other options above.
To reproduce, download the PDF in this link (https://fccid.io/document.php?id=2149405), use it as an argument for pdfimages and pdfimages -all and use the first .ppm file and the first .jpg file as arguments for compare. When I do this, it produces an image file containing red to indicate a difference in the images.

Is there something that I don't understand? Is pdfimages adding pixel information by default when it creates .ppm and .pbm files?

Best Answer

pdfimages -all returns the exact file that was stored in the pdf.

We can test this by doing a round-trip: starting with a jpg image, we add it to a pdf using LaTeX, extract it using pdfimages -all, and then compare it to the original. (The reason for using LaTeX will be explained later.)

I have the first jpg image as extracted from your link and I named it device.jpg. Let's put it in a PDF file using LaTeX:

$ cat img.tex 
\documentclass{article}
\usepackage{graphicx}
\begin{document}
\includegraphics[width=5in,keepaspectratio]{device}
\end{document}
$ pdflatex img
[...snip...]
Output written on img.pdf (1 page, 672455 bytes).
Transcript written on img.log.

Now, let's extract it using pdfimages -all and compare it with the original:

$ pdfimages -all img.pdf img-all
$ cmp device.jpg img-all-000.jpg 
$

The extracted jpg is byte-for-byte identical to the original.

Footnote: the reason for using LaTeX

The above test cannot be done using just any PDF creator. This is because not all PDF creators will put images into a PDF unmolested. For example, let's try ImageMagick's convert:

$ convert device.jpg device.pdf
$ pdfimages -all device.pdf device-all
$ cmp device.jpg device-all-000.jpg 
device.jpg device-all-000.jpg differ: byte 4, line 1

convert re-sampled the image to a smaller size before placing it in the pdf.

$ ls -1s device.jpg device-all-000.jpg 
528 device-all-000.jpg
656 device.jpg

Image accuracy was part of pdflatex's design goals. Other PDF creation software may, by default, "optimize" images before placing them in the PDF.

Update: ShreevatsaR points out that the img2pdf utility also provides a lossless method to convert images to PDF. Non-TeX users will also likely find it much simpler to use.

Related Solutions

Ubuntu – How to batch convert an image to a PDF

One workaround is to split the image generation and the PDF conversion. First convert the images via convert to A4@300dpi (i.e. 3508x2479), then use sam2p to convert them to PDF and then use sam2p_pdf_scale to convert them to A4.

convert -rotate "90>" -scale 3508x2479 -border 64x64 -bordercolor white in.png out.png
sam2p out.png out.pdf
sam2p_pdf_scale 595 842 out.pdf

Edit: A more complete script:

#!/bin/sh

A4_WIDTH=2479
A4_HEIGHT=3508

H_MARGIN=64
V_MARGIN=64
WIDTH=$((${A4_WIDTH} - ${H_MARGIN} * 2))
HEIGHT=$((${A4_HEIGHT} - ${V_MARGIN} * 2))

for i in "$@"; do
    TMP="/tmp/$(uuidgen).png"
    echo "$i"
    convert \
        -rotate "90>" \
        -scale "${WIDTH}x${HEIGHT}" \
        -border "${H_MARGIN}x${V_MARGIN}" -bordercolor white \
        -gravity center \
        -extent "${A4_WIDTH}x${A4_HEIGHT}" \
        -gravity center \
        -font helvetica -pointsize 80 \
        -fill white -draw \
        "push graphic-context
         translate $((${A4_WIDTH}/2 - 160)), 0
         rotate 90
         text -2,-2 '$i'
         text -2,2 '$i'
         text 2,-2 '$i'
         text 2,2 '$i'
         pop graphic-context
    " \
        -fill black -draw \
        "push graphic-context
         translate $((${A4_WIDTH}/2 - 160)), 0
         rotate 90
         text 0,0 '$i'
         pop graphic-context
    " \
        "$i" "$TMP"
    sam2p "$TMP" "${i}.pdf"
    sam2p_pdf_scale 595 842 "${i}.pdf"
done

# EOF #

Ubuntu – ImageMagick convert can’t convert to webp

Fixed in 16.04

In 16.04 convert flyer.png flyer.webp does work, although webp is needed:

sudo apt-get install webp

Without webp installed, this error message will show:

convert: delegate failed `"cwebp" -quiet -q %Q "%i" -o "%o"' @ error/delegate.c/InvokeDelegate/1310.