I have a pdf file which contains some three lines of text on the top of every page. Is there any way to remove that 2cm horizontal strip from all pages of this pdf file using any command line tool. I am using ubuntu.
Ubuntu – Remove a horizontal top strip on all pages of a pdf file
pdfUbuntu
Related Solutions
Here's a small Python script using the old PyPdf library that does the job neatly. Save it in a script called un2up
(or whatever you like), make it executable (chmod +x un2up
), and run it as a filter (un2up <2up.pdf >1up.pdf
).
#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()
for p in [input.getPage(i) for i in range(0,input.getNumPages())]:
q = copy.copy(p)
(w, h) = p.mediaBox.upperRight
p.mediaBox.upperRight = (w/2, h)
q.mediaBox.upperLeft = (w/2, h)
output.addPage(p)
output.addPage(q)
output.write(sys.stdout)
Ignore any deprecation warnings; only the PyPdf maintainers need be concerned with those.
If the input is oriented in an unusual way, you may need to use different coordinates when truncating the pages. See Why my code not correctly split every page in a scanned pdf?
Just in case it's useful, here's my earlier answer which uses a combination of two tools plus some manual intervention:
- Pdfjam (at least version 2.0), based on the pdfpages LaTeX package, to crop the pages;
- Pdftk, to put the left and right halves back together.
Both tools are needed because as far as I can tell pdfpages isn't able to apply two different transformations to the same page in one stream. In the call to pdftk
, replace 42 by the number of pages in the input document (2up.pdf
).
pdfjam -o odd.pdf --trim '0cm 0cm 14.85cm 0cm' --scale 1.141 2up.pdf
pdfjam -o even.pdf --trim '14.85cm 0cm 0cm 0cm' --scale 1.141 2up.pdf
pdftk O=odd.pdf E=even.pdf cat $(i=1; while [ $i -le 42 ]; do echo O$i E$i; i=$(($i+1)); done) output all.pdf
In case you don't have pdfjam 2.0, it's enough to have a PDFLaTeX installation with the pdfpages package (on Ubuntu: you need texlive-latex-recommended and perhaps (on Ubuntu: texlive-fonts-recommended ), and use the following driver file driver.tex
:
\batchmode
\documentclass{minimal}
\usepackage{pdfpages}
\begin{document}
\includepdfmerge[trim=0cm 0cm 14.85cm 0cm,scale=1.141]{2up.pdf,-}
\includepdfmerge[trim=14.85cm 0cm 0cm 0cm,scale=1.141]{2up.pdf,-}
\end{document}
Then run the following commands, replacing 42 by the number of pages in the input file (which must be called 2up.pdf
):
pdflatex driver
pdftk driver.pdf cat $(i=1; pages=42; while [ $i -le $pages ]; do echo $i $(($pages+$i)); i=$(($i+1)); done) output 1up.pdf
This should work it needs pdftk
tool ( and ghostscript
).
A simple case:
Step One: Split into individual pages
pdftk clpdf.pdf burst
this produces files pg_0001.pdf, pg_0002.pdf, ... pg_NNNN.pdf
, one for each page.
It also produces doc_data.txt
which contains page dimensions.
Step Two: Create left and right half pages
pw=`cat doc_data.txt | grep PageMediaDimensions | head -1 | awk '{print $2}'`
ph=`cat doc_data.txt | grep PageMediaDimensions | head -1 | awk '{print $3}'`
w2=$(( pw / 2 ))
w2px=$(( w2*10 ))
hpx=$(( ph*10 ))
for f in pg_[0-9]*.pdf ; do
lf=left_$f
rf=right_$f
gs -o ${lf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [0 0]>> setpagedevice" -f ${f}
gs -o ${rf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [-${w2} 0]>> setpagedevice" -f ${f}
done
Step Three: Merge left and right in order to produce newfile.pdf
containing single page .pdf.
ls -1 [lr]*_[0-9]*pdf | sort -n -k3 -t_ > fl
pdftk `cat fl` cat output newfile.pdf
A more general case:
The example above assumes all pages are same size. The
doc_data.txt
file contains size for each split page. If the commandgrep PageMediaDimensions <doc_data.txt | sort | uniq | wc -l
does not return 1 then the pages have different dimensions and some extra logic is needed in Step Two.
If the split is not exactly 50:50 then a better formula than
w2=$(( pw / 2 ))
, used in the example above, is needed.
This second example shows how to handle this more general case.
Step One: split with pdftk
as before
Step Two: Now create three files that contain the width and height of each pages and a default for the fraction of the split the left page will use.
grep PageMediaDimensions <doc_data.txt | awk '{print $2}' > pws.txt
grep PageMediaDimensions <doc_data.txt | awk '{print $3}' > phs.txt
grep PageMediaDimensions <doc_data.txt | awk '{print "0.5"}' > lfrac.txt
the file lfrac.txt
can be hand edited if information is available
for where to split different pages.
Step Three: Now create left and right split pages, using the different pages sizes and (if edited) different fractional locations for the split.
#!/bin/bash
exec 3<pws.txt
exec 4<phs.txt
exec 5<lfrac.txt
for f in pg_[0-9]*.pdf ; do
read <&3 pwloc
read <&4 phloc
read <&5 lfr
wl=`echo "($lfr)"'*'"$pwloc" | bc -l`;wl=`printf "%0.f" $wl`
wr=$(( pwloc - wl ))
lf=left_$f
rf=right_$f
hpx=$(( phloc*10 ))
w2px=$(( wl*10 ))
gs -o ${lf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [0 0]>> setpagedevice" -f ${f}
w2px=$(( wr*10 ))
gs -o ${rf} -sDEVICE=pdfwrite -g${w2px}x${hpx} -c "<</PageOffset [-${wl} 0]>> setpagedevice" -f ${f}
done
Step Four: This is the same merge step as in the previous, simpler, example.
ls -1 [lr]*_[0-9]*pdf | sort -n -k3 -t_ > fl
pdftk `cat fl` cat output newfile.pdf
Best Answer
PDFjam should be able to do it. It should be installable on Ubuntu with
sudo apt install pdfjam
. Then, move into the directory containing your PDF files and run:That will create a cropped copy of each pdf file in the directory, where
file.pdf
becomesfile-cropped.pdf
. If you are satisfied those are correct, you can move them to a new directory (mv *-cropped.pdf newdir/
) and delete the rest.