PDF => raster, is it possible to adapt the sampling resolution to the input page size

ghostscriptimagemagickpdf

I am using convert (Imagemagick component, delegating to Ghostscript in background) to transform the first page of PDF files to images.

Usually, convert -density 200 file.pdf[0] first_page.png will do the job, and it will sample the PDF file at 200 pixels per inch of paper.

However it seldom happens that some PDF are abnormally huge (sometimes A0 paper, and recently a PDF with a page exceeding 23 m² (183 inch in length, 185 in width).

For such files, convert will hang, eat CPU time. Images of 35000+ pixels in width and height are simply not usable.

Therefore the question: is there a switch in Imagemagick that would adapt the density to the page size, or at least specify that we don't want to sample more than a portion of maximal area of the PDF file (top left corner, 30×30 inch for example)?

Thanks.

EDIT: On its official git repository, MuPDF has added the -w and -h switches that, jointly with -r will do what is wanted here.

Best Answer

I modified mupdf's pdfdraw to support drawing in best fit mode, so I could state that the output needed to be 128x128 at most and it would fit the output in the box while maintaining the aspect ratio. Before I did that the only way was to use pdfinfo to get the page size and then do the calcuations to fit it in a box and then ask pdfdraw to draw it with that scale factor (dots per inch).

Well, after that long story the process to do that is rather simple:

  1. get the page size of the page to render (in pdf terms the media box) this can be done via pdfinfo and grep and will appear in pts (points, 1/72th of an inch) or via a pdf library like pyPDF like:

    import pyPdf
    p = pyPdf.PdfFileReader(file("/home/dan/Desktop/Sieve-JFP.pdf", "rb"))
    x,y,w,h = p.pages[0]['/MediaBox']
    
  2. for a box fit do dpi = min( A/(w/72.), B/(h/72.) )
    where A is the maximum width and B is the maximum height; w and h are the width and height of the page.

  3. pass dpi to convert -density $dpi

and as requested a slightly fudged git commit diff:

commit 0000000000000000000000000000000000000000
Author: Dan D.
Date:   Thu Jul 28 16:33:33 2011 -0400

    add options to pdfdraw to limit the output's width and height

    note that scaling must occur before rotation

diff --git a/apps/pdfdraw.c b/apps/pdfdraw.c
index 0000000..1234567 100644
--- a/apps/pdfdraw.c
+++ b/apps/pdfdraw.c
@@ -12,8 +12,10 @@
 #endif

 char *output = NULL;
-float resolution = 72;
+float resolution = -1;
 float rotation = 0;
+float width = -1;
+float height = -1;

 int showxml = 0;
 int showtext = 0;
@@ -47,6 +49,8 @@ static void usage(void)
        "\t\tsupported formats: pgm, ppm, pam, png, pbm\n"
        "\t-p -\tpassword\n"
        "\t-r -\tresolution in dpi (default: 72)\n"
+       "\t-w -\tmaximum width (default: no limit)\n"
+       "\t-h -\tmaximum height (default: no limit)\n"
        "\t-A\tdisable accelerated functions\n"
        "\t-a\tsave alpha channel (only pam and png)\n"
        "\t-b -\tnumber of bits of antialiasing (0 to 8)\n"
@@ -150,13 +154,39 @@ static void drawpage(pdf_xref *xref, int pagenum)

    if (output || showmd5 || showtime)
    {
-       float zoom;
+       float zoom = 1.0;
        fz_matrix ctm;
        fz_bbox bbox;
        fz_pixmap *pix;
+       float W, H;

-       zoom = resolution / 72;
-       ctm = fz_translate(0, -page->mediabox.y1);
+       ctm = fz_identity;
+       ctm = fz_concat(ctm, fz_translate(0, -page->mediabox.y1));
+       ctm = fz_concat(ctm, fz_rotate(page->rotate));
+       ctm = fz_concat(ctm, fz_rotate(rotation));
+       bbox = fz_round_rect(fz_transform_rect(ctm, page->mediabox));
+
+       W = bbox.x1 - bbox.x0; 
+       H = bbox.y1 - bbox.y0;
+       if (resolution != -1)
+           zoom = resolution / 72;
+       if (width != -1) 
+       {
+           if (resolution != -1)
+               zoom = MIN(zoom, width/W);
+           else
+               zoom = width/W;
+       }
+       if (height != -1)
+       {
+           if (resolution != -1 || width != -1)
+               zoom = MIN(zoom, height/H);
+           else
+               zoom = height/H;
+       }
+
+       ctm = fz_identity;
+       ctm = fz_concat(ctm, fz_translate(0, -page->mediabox.y1));
        ctm = fz_concat(ctm, fz_scale(zoom, -zoom));
        ctm = fz_concat(ctm, fz_rotate(page->rotate));
        ctm = fz_concat(ctm, fz_rotate(rotation));
@@ -295,7 +325,7 @@ int main(int argc, char **argv)
    fz_error error;
    int c;

-   while ((c = fz_getopt(argc, argv, "o:p:r:R:Aab:dgmtx5")) != -1)
+   while ((c = fz_getopt(argc, argv, "o:p:r:R:w:h:Aab:dgmtx5")) != -1)
    {
        switch (c)
        {
@@ -303,6 +333,8 @@ int main(int argc, char **argv)
        case 'p': password = fz_optarg; break;
        case 'r': resolution = atof(fz_optarg); break;
        case 'R': rotation = atof(fz_optarg); break;
+       case 'w': width = atof(fz_optarg); break;
+       case 'h': height = atof(fz_optarg); break;
        case 'A': accelerate = 0; break;
        case 'a': savealpha = 1; break;
        case 'b': alphabits = atoi(fz_optarg); break;
@@ -321,6 +353,10 @@ int main(int argc, char **argv)
    if (fz_optind == argc)
        usage();

+   if (width+height == -2)
+       if (resolution == -1)
+           resolution = 72;
+
    if (!showtext && !showxml && !showtime && !showmd5 && !output)
    {
        printf("nothing to do\n");
Related Question