Linux – How to extract and/or remove the last page of a bunch of PDFs

linuxpdf

One of our vendors started tacking on an unnecessarily huge image to the last page of PDFs we get from them. I need to trim this out. However, we have hundreds of these, so it's prohibitive to go in manually. What're the best ways to extract and then delete (Preferably first one, then the other; I still need to confirm via filesize that I'm not deleting one which doesn't have the image) the last page of a PDF automatically? OS is Linux.

I can extract it using ghostscript, with something along the lines of gs -dFirstPage=5 -dLastPage=5, but I need to automate this, I can't go through and manually find out what the number of the last page is.

Any ideas?

Edit: To clarify, I simply want to split out/delete the last page. Not the image in it, excise the last page period.

Best Answer

As @Daniel Andersson already commented, this can easily be done with pdftk:

pdftk input.pdf cat end-1 output temp.pdf
pdftk temp.pdf  cat end-2 output output.pdf
rm temp.pdf

I don't know if it can be done with one call to pdftk though...

Edit: you could combine it with thanosk's answer and use (in bash):

pdftk input.pdf cat 1-$((last-1)) output output.pdf

when you already extracted the last page to the variable $last.

Related Solutions

PDF – How to Extract First Page from Multiple PDFs

Using pdftk...

On mac and linux from the command-line.

for file in *.pdf ; do pdftk "$file" cat 1 output "${file%.pdf}-page1.pdf" ; done

On Windows, you could create a batch file. Open up Notepad, paste this inside:

for %%I in (*.pdf) do "pdftk.exe" "%%I" cat 1 output "%%~nI-page1.pdf"

You may need to replace "pdftk.exe" with the full path to pdftk, e.g., "C:\Program Files\pdftk\pdftk.exe or whatever it is. (I don't use Windows so I don't know.)

Save it with an extension ending in .bat, drop it in the folder with the PDFs and double click.

You can do the same thing with Ghostscript, yes.

Let's see. For Mac and Linux (all one line):

for file in *.pdf ; do gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="${file%.pdf}-page1.pdf" -dFirstPage=1 -dLastPage=1 "$file" ; done

I'm not exactly sure what the corresponding command would be for a Windows batch file. My best guess (--I don't have windows so I can't test--):

for %%I in (*.pdf) do "C:\Program Files\gs\gs9.00\gswin32c.exe" -dSAFER -dNOPAUSE -dBATCH -sDEVICE#pdfwrite -sOutPutFile#"%%~nI-page1.pdf" -dFirstPage#1 -dLastPage#1 "%%I"

Double check the path to your ghost script executable is right, and well, I haven't tested this since I don't use Windows.

EDIT: OK, I just realized you probably don't want 500 1-page PDFs, but a single PDF that combines them all. Just run the above, and that will leave you with 500 1-page PDFs. To combine them using pdftk... on mac and linux:

pdftk *-page1.pdf cat output combined.pdf

I think it's probably the same on Windows, except maybe needing the full path to pdftk, as above. You could just add that line after the line above in your batch file.

With Ghostscript... on mac and linux:

gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="combined.pdf" *-page1.pdf

And it's probably the same on Windows, except replacing "gs" at the beginning with the full path to gswin32c.exe, as above.

There may be a way of ghostscript to do both in one step, but I'm too lazy to figure it out right now.

If the order in which to combine them is important, then we'll need more information.

Linux – Quickly browse through many PDFs

I am in the same need when I produce multiple plots and graphs (with R typically) that I want to be .pdf for resolution and for LaTeX integration.

I may have 3 suggestions, the 1st being the one I like the most (just found it and really excited about it).

impressive
- Install it:
```
sudo apt-get install impressive
```
- Then, from a terminal in your directory:
```
impressive -T0 -w *.pdf
```
- It will display a presentation of your pdf files. the -T0 option removes transitions (or, equivalently, -t None), and the -w wraps the presentation (you can return to 1st slide from the last one).
  
  You may want to use the -f switch to avoid starting in fullscreen mode (anyway you can toggle to fullscreen hitting the "f" key).
  
  For zooming, position your mouse where you want to zoom in, and hit "z".

Now, to relate to your question, it unfortunately doesn't keep the viewing position, and the zoom feature is limited. Otherwise, I believe it is better to use impressive than merging into one file, in terms of memory usage.

Here is a quick fix using mupdf and a bash script: Is there a way to quickly browse multiple pdfs in a directory?
Otherwise I am just discovering the great Zathura pdf reader, it's highly customizable, I suspect there might be a way to write a plugin and bind keys to switch to the next pdf.

Best Answer

Related Solutions

PDF – How to Extract First Page from Multiple PDFs

Linux – Quickly browse through many PDFs

Related Question