PDF Conversion – Convert JPG to PDF Without Data Loss

conversionjpegpdf

Trying to make such things:

1st step:

convert img.jpg img.pdf

2nd step:

pdfimages -j img.pdf img1

Comparing source and extracted images in HEX shows difference. How to make such conversion without data loss?

Best Answer

One way is to use pdflatex instead of convert.

You need in a extra file, which is here called image.tex:

\documentclass{article}
\usepackage[active,tightpage]{preview}
\usepackage{graphicx}
\PreviewMacro[{*[][]{}}]{\includegraphics}
\begin{document}
   \includegraphics{img.jpg}
\end{document}

Then run pdflatex image.tex to generate image.pdf.

Related Solutions

PDF to JPG – Convert Without Quality Loss Using gscan2pdf

It's not clear what you mean by "quality loss". That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).

Perhaps you need to use -density to do the conversion at a higher dpi:

convert -density 300 file.pdf page_%04d.jpg

(You can prepend -units PixelsPerInch or -units PixelsPerCentimeter if necessary. My copy defaults to ppi.)

Update: As you pointed out, gscan2pdf (the way you're using it) is just a wrapper for pdfimages (from poppler). pdfimages does not do the same thing that convert does when given a PDF as input.

convert takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.

pdfimages looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.

As a result, if what you have is a PDF that's just a wrapper around a series of bitmaps, pdfimages will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the -j option to pdfimages, because a PDF can contain raw JPEG data. By default, pdfimages converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.

So, try

pdfimages -j file.pdf page

You may or may not need to follow that with a convert to .jpg step (depending on what bitmap format the PDF was using).

I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can't get higher quality than that.

How to use Bash to find 2 bytes in a binary file, increase their values, and replace

Testing with this file:

$ echo hello world > test.txt
$ echo -n $'\x1b\x1f' >> test.txt
$ echo whatever >> test.txt
$ hexdump -C test.txt 
00000000  68 65 6c 6c 6f 20 77 6f  72 6c 64 0a 1b 1f 77 68  |hello world...wh|
00000010  61 74 65 76 65 72 0a                              |atever.|
$ grep -a -b --only-matching $'\x1b\x1f' test.txt 
12:

So in this case the 1B 1F is at position 12.

Convert to integer (there is probably an easier way)

$ echo 'ibase=16; '`xxd -u -ps -l 2 -s 12 test.txt`  | bc
6943

And the reverse:

$ printf '%04X' 6943 | xxd -r -ps | hexdump -C
00000000  1b 1f                                             |..|
$ printf '%04X' 4242 | xxd -r -ps | hexdump -C
00000000  10 92                                             |..|

And putting it back in the file:

$ printf '%04X' 4242 | xxd -r -ps | dd of=test.txt bs=1 count=2 seek=12 conv=notrunc
2+0 records in
2+0 records out
2 bytes (2 B) copied, 5.0241e-05 s, 39.8 kB/s

Result:

$ hexdump -C test.txt
00000000  68 65 6c 6c 6f 20 77 6f  72 6c 64 0a 10 92 77 68  |hello world...wh|
00000010  61 74 65 76 65 72 0a                              |atever.|

Best Answer

Related Solutions

PDF to JPG – Convert Without Quality Loss Using gscan2pdf

How to use Bash to find 2 bytes in a binary file, increase their values, and replace

Related Question