Combine parts of pages of a PDF document

mergepdf

I have a PDF which has some content split onto 2 pages.

The first part takes less or equal the bottom half of the first page and the second part takes less than the top half of the second page. For example, if x is the wanted content on the first page and y is the wanted content on the second page and - is the content I don't want to be in the output document we have:

|-|  |y|
|-|  |y|
|-|  |-|
|x|  |-|
|x|  |-|

And I'd like to have

|x|
|x|
|y|
|y|

on one page.

Is it possible to merge these parts that way on Linux?

Best Answer

Believe you should be able to tweak this script to do what you want:

It is somewhat utility heavy using:

  • pdfinfo - get dimensions.
  • gs - extract boxes from pages.
  • pdftk - collate to one PDF.
  • pdfjam - generate 1 of 2 pages.

As of now it works for equal sizes extracted from top/bottom. (Currently hard coded with offs=50 AKA 50%). With some tweaking you should be able to make it work for e.g. 70% - 30% or what ever.


pdf50x50:

#!/bin/bash

if ! [ -r "$1" ]; then
    printf "Unable to read file \`%s'\n" "$1" >&2
    exit 1
fi
fn_in="$1"

# A (debug) counter for "temp" files.
# NOTE: Printing to file .pdftestnr in working directory
fn_nr=.pdftestnr

[ -r $fn_nr ] && nr=$(<$fn_nr) || nr=0
((++nr))
printf %d $nr > $fn_nr

# File names.
fn_top=$(printf "top-%03d.pdf" $nr)
fn_bottom=$(printf "bottom-%03d.pdf" $nr)
fn_combi=$(printf "combi-%03d.pdf" $nr)
fn_fine=$(printf "fine-%03d.pdf" $nr)

# Get dimensions
read -r p w h <<<$(pdfinfo $fn_in | awk '/^Pages:/{print $2}/^Page size/{print $3, $5}')
# Calculate pixel dimensions (might fail.)
((pix_w = w * 10))
((pix_h = h * 10))

printf "Size %dx%d pts of %d pages\n" $w $h $p

# Percent
offs=50

((offs = h * offs / 100))
((pix_crop_h = pix_h - offs * 10 ))

echo $pix_crop_h $offs

# Extract top box to own pdf.
gs \
    -o $fn_top \
    -sDEVICE=pdfwrite \
    -g${pix_w}x$pix_crop_h \
    -c "<</PageOffset [0 -$offs]>> setpagedevice" \
    -f $fn_in

# Extract bottom box to own pdf.
gs \
    -o $fn_bottom \
    -sDEVICE=pdfwrite \
    -g${pix_w}x$pix_crop_h \
    -c "<</PageOffset [0 0]>> setpagedevice" \
    -f $fn_in


# Combine top and bottom files to one file.
pdftk \
  A=$fn_top \
  B=$fn_bottom \
  cat A1 B2 \
  output $fn_combi \
  verbose

# Combine 2 pages to one.
pdfjam $fn_combi --nup 1x2 --outfile $fn_fine