How to search PDFs and extract matching pages with Automator

automatorpdf

I am trying to make an Automator workflow that will alllow me to:

  1. Specify a folder to run the actions on
  2. Search all PDF files in that folder for a certain word (my client's name)
  3. Create a new PDF file with just those pages on which my client's name appears
  4. Save that file on the desktop

enter image description here

Thus far, I can do steps 1 and 2. But is there any way to see what pages the matches were on or to create a new PDF for the matching pages?

Best Answer

I realise this is a year after you asked the question but I liked the challenge. So, in summary this is how I would accomplish this...

  • For every PDF in folder, convert it to Text.
  • Use a perl command to search the text files for keyword and return the page number(s).
  • Use command line tool to extract page(s) from PDF.
  • Merge extracted pages.

You can do the first part easily enough with applescript/automator.

The perl command to get the page numbers is:

perl -ne 'print "$1$2" if /blah/ .. /--- Page (\d+) ---(\n)/'

The command tool to use to extract pages from PDF file can be found at users.skynet.be/tools/

Finally to merge the single pages can be done with automator or the above tools as well.

Hope this helps.