It's not clear what you mean by "quality loss". That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).
Perhaps you need to use -density
to do the conversion at a higher dpi:
convert -density 300 file.pdf page_%04d.jpg
(You can prepend -units PixelsPerInch
or -units PixelsPerCentimeter
if necessary. My copy defaults to ppi.)
Update: As you pointed out, gscan2pdf
(the way you're using it) is just a wrapper for pdfimages
(from poppler). pdfimages
does not do the same thing that convert
does when given a PDF as input.
convert
takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.
pdfimages
looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.
As a result, if what you have is a PDF that's just a wrapper around a series of bitmaps, pdfimages
will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the -j
option to pdfimages
, because a PDF can contain raw JPEG data. By default, pdfimages
converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.
So, try
pdfimages -j file.pdf page
You may or may not need to follow that with a convert
to .jpg
step (depending on what bitmap format the PDF was using).
I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can't get higher quality than that.
You can check-out pdfdraw
from mupdf (package mupdf-tools
under Debian/Debian-derivatives).
From its description:
pdfdraw will render a PDF document to image files. The
supported image formats are: pgm, ppm, pam and png. Select the
pages to be rendered by specifying a comma separated list of
ranges and individual page numbers for example: 1,5,10-15). In
no pages are specified all the pages will be rendered.
Perhaps it is faster for your use case.
For example mupdf
(a PDF viewer) is really fast (and consumes very little memory) for a lot of documents I tested it with.
Best Answer
Assuming there are only images in that folder, you can
to get all filenames that do not end with
jpg
, which I assume are all the images you want to convert. Then you can use the toolconvert
from ImageMagick like thisThe
convert
command expands toconvert <file name as printed by ls> <file name without extention>.jpg
. The extentionjpg
will tellconvert
to convert to jpg format.