Ubuntu – Remove a horizontal top strip on all pages of a pdf file

pdfUbuntu

I have a pdf file which contains some three lines of text on the top of every page. Is there any way to remove that 2cm horizontal strip from all pages of this pdf file using any command line tool. I am using ubuntu.

Best Answer

PDFjam should be able to do it. It should be installable on Ubuntu with sudo apt install pdfjam. Then, move into the directory containing your PDF files and run:

for f in *pdf; do pdfjam --keepinfo --trim "2cm 0mm 0mm 0mm" --clip true --suffix "cropped" "$f"; done

That will create a cropped copy of each pdf file in the directory, where file.pdf becomes file-cropped.pdf. If you are satisfied those are correct, you can move them to a new directory (mv *-cropped.pdf newdir/) and delete the rest.

Related Solutions

PDF Conversion – How to Split Pages in PDF Using Command Line

Here's a small Python script using the old PyPdf library that does the job neatly. Save it in a script called un2up (or whatever you like), make it executable (chmod +x un2up), and run it as a filter (un2up <2up.pdf >1up.pdf).

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()
for p in [input.getPage(i) for i in range(0,input.getNumPages())]:
    q = copy.copy(p)
    (w, h) = p.mediaBox.upperRight
    p.mediaBox.upperRight = (w/2, h)
    q.mediaBox.upperLeft = (w/2, h)
    output.addPage(p)
    output.addPage(q)
output.write(sys.stdout)

_{Ignore any deprecation warnings; only the PyPdf maintainers need be concerned with those.}

If the input is oriented in an unusual way, you may need to use different coordinates when truncating the pages. See Why my code not correctly split every page in a scanned pdf?

Just in case it's useful, here's my earlier answer which uses a combination of two tools plus some manual intervention:

Pdfjam (at least version 2.0), based on the pdfpages LaTeX package, to crop the pages;
Pdftk, to put the left and right halves back together.

Both tools are needed because as far as I can tell pdfpages isn't able to apply two different transformations to the same page in one stream. In the call to pdftk, replace 42 by the number of pages in the input document (2up.pdf).

pdfjam -o odd.pdf --trim '0cm 0cm 14.85cm 0cm' --scale 1.141 2up.pdf
pdfjam -o even.pdf --trim '14.85cm 0cm 0cm 0cm' --scale 1.141 2up.pdf
pdftk O=odd.pdf E=even.pdf cat $(i=1; while [ $i -le 42 ]; do echo O$i E$i; i=$(($i+1)); done) output all.pdf

In case you don't have pdfjam 2.0, it's enough to have a PDFLaTeX installation with the pdfpages package (on Ubuntu: you need texlive-latex-recommended and perhaps (on Ubuntu: texlive-fonts-recommended ), and use the following driver file driver.tex:

\batchmode
\documentclass{minimal}
\usepackage{pdfpages}
\begin{document}
\includepdfmerge[trim=0cm 0cm 14.85cm 0cm,scale=1.141]{2up.pdf,-}
\includepdfmerge[trim=14.85cm 0cm 0cm 0cm,scale=1.141]{2up.pdf,-}
\end{document}

Then run the following commands, replacing 42 by the number of pages in the input file (which must be called 2up.pdf):

pdflatex driver
pdftk driver.pdf cat $(i=1; pages=42; while [ $i -le $pages ]; do echo $i $(($pages+$i)); i=$(($i+1)); done) output 1up.pdf

PDF – How to Split Each PDF Page into Two Pages Using the Command Line

This should work it needs pdftk tool ( and ghostscript ).

A simple case:

Step One: Split into individual pages

 pdftk clpdf.pdf burst

this produces files pg_0001.pdf, pg_0002.pdf, ... pg_NNNN.pdf, one for each page. It also produces doc_data.txt which contains page dimensions.

Step Two: Create left and right half pages

  pw=`cat doc_data.txt  | grep PageMediaDimensions | head -1 | awk '{print $2}'`
  ph=`cat doc_data.txt  | grep PageMediaDimensions | head -1 | awk '{print $3}'`
  w2=$(( pw / 2 ))
  w2px=$(( w2*10 ))
  hpx=$((  ph*10 ))
  for f in  pg_[0-9]*.pdf ; do
   lf=left_$f
   rf=right_$f
   gs -o ${lf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [0 0]>> setpagedevice" -f ${f}
   gs -o ${rf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [-${w2} 0]>> setpagedevice" -f ${f}
  done

Step Three: Merge left and right in order to produce newfile.pdf containing single page .pdf.

  ls -1 [lr]*_[0-9]*pdf | sort -n -k3 -t_ > fl
  pdftk `cat fl`  cat output newfile.pdf

A more general case:

The example above assumes all pages are same size. The doc_data.txt file contains size for each split page. If the command

grep PageMediaDimensions <doc_data.txt | sort | uniq | wc -l

does not return 1 then the pages have different dimensions and some extra logic is needed in Step Two.
If the split is not exactly 50:50 then a better formula than w2=$(( pw / 2 )), used in the example above, is needed.

This second example shows how to handle this more general case.

Step One: split with pdftk as before

Step Two: Now create three files that contain the width and height of each pages and a default for the fraction of the split the left page will use.

  grep PageMediaDimensions <doc_data.txt | awk '{print $2}'    >   pws.txt
  grep PageMediaDimensions <doc_data.txt | awk '{print $3}'    > phs.txt
  grep PageMediaDimensions <doc_data.txt | awk '{print "0.5"}' > lfrac.txt

the file lfrac.txt can be hand edited if information is available for where to split different pages.

Step Three: Now create left and right split pages, using the different pages sizes and (if edited) different fractional locations for the split.

#!/bin/bash
exec 3<pws.txt
exec 4<phs.txt
exec 5<lfrac.txt

for f in  pg_[0-9]*.pdf ; do
 read <&3 pwloc
 read <&4 phloc
 read <&5 lfr
 wl=`echo "($lfr)"'*'"$pwloc" | bc -l`;wl=`printf "%0.f" $wl`
 wr=$(( pwloc - wl ))
 lf=left_$f
 rf=right_$f
 hpx=$((  phloc*10 ))
 w2px=$(( wl*10 ))
 gs -o ${lf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [0 0]>> setpagedevice" -f ${f}
 w2px=$(( wr*10 ))
 gs -o ${rf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [-${wl} 0]>> setpagedevice" -f ${f}
done

Step Four: This is the same merge step as in the previous, simpler, example.

  ls -1 [lr]*_[0-9]*pdf | sort -n -k3 -t_ > fl
  pdftk `cat fl`  cat output newfile.pdf