Convert djvu to pdf

conversiondjvupdf

How to convert djvu2pdf ?

My current approach is :

djvups x.djvu > x.ps
ps2pdf x.ps

Is there more efficient and better (in terms of output quality, data/metadata loss) way to handle that ?

Best Answer

I tried printing the djvu file to PDF (using Evince, so it's probably a mix of djvulibre, gtk+ and cairo), but I got a way smaller result by converting the djvu pages to pdf using ImageMagick's convert.

For this, you need to

extract each page as a separate djvu document (I'm not sure if convert is able to deal multipage djvu and multipage pdf that easily), see djvmcvt -i (an "indirect" document is a document where each page is stored in a separate djvu file)
convert the page using convert — we're not losing anything here, remember that djvu is not vectorial, so even if you're generating an Adobe PDF, you're using it for a raster image
join the pages in a single PDF (you can just feed them to ghostscript -- for example, generating an A4 PDF named out.pdf with all pages from *.pdf in the current directory would be gs -q -sPAPERSIZE=a4 -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=out.pdf *.pdf)

This said, keep in mind that

This does just a straight conversion of a raster image — I guess the only thing you can tweak is the image quality, if it gets stored in the pdf using lossy compression (if convert is unable to do that, ghostscript has some options to tweak the output of pdfwrite, along the lines of -dPDFSETTINGS=, I'm not sure but these may include the possibility of enforcing lossy compression and defining the quality level)
This does not use djvu-specific knowledge, I guess the fact djvu encodes foreground and background separately can be used to generate the PDF in a clever way that somehow uses that to save some space
PDF is for vectorial stuff, djvu is way better suited for rasterized documents than PDF.

Related Solutions

PDF to JPG – Convert Without Quality Loss Using gscan2pdf

It's not clear what you mean by "quality loss". That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).

Perhaps you need to use -density to do the conversion at a higher dpi:

convert -density 300 file.pdf page_%04d.jpg

(You can prepend -units PixelsPerInch or -units PixelsPerCentimeter if necessary. My copy defaults to ppi.)

Update: As you pointed out, gscan2pdf (the way you're using it) is just a wrapper for pdfimages (from poppler). pdfimages does not do the same thing that convert does when given a PDF as input.

convert takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.

pdfimages looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.

As a result, if what you have is a PDF that's just a wrapper around a series of bitmaps, pdfimages will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the -j option to pdfimages, because a PDF can contain raw JPEG data. By default, pdfimages converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.

So, try

pdfimages -j file.pdf page

You may or may not need to follow that with a convert to .jpg step (depending on what bitmap format the PDF was using).

I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can't get higher quality than that.

Convert pdf to image [jpeg|png]

You can check-out pdfdraw from mupdf (package mupdf-tools under Debian/Debian-derivatives).

From its description:

pdfdraw will render a PDF document to image files. The supported image formats are: pgm, ppm, pam and png. Select the pages to be rendered by specifying a comma separated list of ranges and individual page numbers for example: 1,5,10-15). In no pages are specified all the pages will be rendered.

Perhaps it is faster for your use case.

For example mupdf (a PDF viewer) is really fast (and consumes very little memory) for a lot of documents I tested it with.

Best Answer

Related Solutions

PDF to JPG – Convert Without Quality Loss Using gscan2pdf

Convert pdf to image [jpeg|png]

Related Question