Extract first page from multiple pdfs

open sourcepdf

Have got about 500 PDFs to go through and extract the first page of. They then need to go through some time consuming conversion process so was hoping to try and save some time by have a batch process to extract just the first page from the 500 pdfs and place it in a new pdf. Have had a poke around Acrobat but can find no real method of doing this for multiple files. Does anyone know any other programs or methods that this could be achieved? Free and open source are obviously more favourable 🙂

EDIT: Have actually had some success using GhostScript to extract just one page. Am now looking at how to batch that and take the list of files and use those.

Best Answer

Using pdftk...

On mac and linux from the command-line.

for file in *.pdf ; do pdftk "$file" cat 1 output "${file%.pdf}-page1.pdf" ; done

On Windows, you could create a batch file. Open up Notepad, paste this inside:

for %%I in (*.pdf) do "pdftk.exe" "%%I" cat 1 output "%%~nI-page1.pdf"

You may need to replace "pdftk.exe" with the full path to pdftk, e.g., "C:\Program Files\pdftk\pdftk.exe or whatever it is. (I don't use Windows so I don't know.)

Save it with an extension ending in .bat, drop it in the folder with the PDFs and double click.

You can do the same thing with Ghostscript, yes.

Let's see. For Mac and Linux (all one line):

for file in *.pdf ; do gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="${file%.pdf}-page1.pdf" -dFirstPage=1 -dLastPage=1 "$file" ; done

I'm not exactly sure what the corresponding command would be for a Windows batch file. My best guess (--I don't have windows so I can't test--):

for %%I in (*.pdf) do "C:\Program Files\gs\gs9.00\gswin32c.exe" -dSAFER -dNOPAUSE -dBATCH -sDEVICE#pdfwrite -sOutPutFile#"%%~nI-page1.pdf" -dFirstPage#1 -dLastPage#1 "%%I"

Double check the path to your ghost script executable is right, and well, I haven't tested this since I don't use Windows.


EDIT: OK, I just realized you probably don't want 500 1-page PDFs, but a single PDF that combines them all. Just run the above, and that will leave you with 500 1-page PDFs. To combine them using pdftk... on mac and linux:

pdftk *-page1.pdf cat output combined.pdf

I think it's probably the same on Windows, except maybe needing the full path to pdftk, as above. You could just add that line after the line above in your batch file.

With Ghostscript... on mac and linux:

gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="combined.pdf" *-page1.pdf

And it's probably the same on Windows, except replacing "gs" at the beginning with the full path to gswin32c.exe, as above.

There may be a way of ghostscript to do both in one step, but I'm too lazy to figure it out right now.

If the order in which to combine them is important, then we'll need more information.

Related Question