How to find all modified PDF files

finderpdfspotlight

I have a folder of 6000+ PDF files (chapters, articles, etc.). I'm trying to weed out/sort those that I've just downloaded but never annotated. Is there a way to do this? Those PDFs that I've never annotated usually have the same "created" and "modified" dates, so I was thinking those criteria could be used (i.e., look for files whose modified date is later than/not the same as the created date), but I have no idea how to do that.

In other words, I need to be able to find any PDF on my computer that has been modified.

Thank you for any help!

Best Answer

Per info in the OP and comments, this will do as you asked.

In Automator:

  • Create a new Workflow.
  • Add a Find Finder Items action.
    • With settings, e.g., Search (Documents)
    • (All) of the following are true
    • (Kind) (is) (PDF)
  • Add a Run AppleScript action.

    • Replace the default code with the following example AppleScript code show further below:

    • Note: If Skim is not in the /Applications folder, then modify the value of the skimpdfPathFilename variable accordingly. You should not need to modify anything else unless you want to set the value of the offsetInSeconds variable, e.g. set offsetInSeconds to 60, to a different value. This variable is used to help find the files that really have been modified since they were created. The granularity differential between the creation date and modification date when a file is first created can be from 0 seconds to a higher value, which is not a consistent value depending on how the file was created. Make adjustments as you see fit for your use case.

What the Workflow and example AppleScript code does:

  • Finds all PDF files in the target folder, including all subfolders.
    • This is done with the Find Finder Items action and its output is passed to the
      Run AppleScript action.
  • Creates a list of all PDF files that have been modified after the creation date, per the value of the offsetInSeconds variable.
    • This is done in the first repeat loop. Files meeting the criteria are stored in modifiedFilesList to be used in the next repeat loop.
  • Creates a list of all files that have annotations made in Skim.
    • This is done using xattr to get the extended attributes of the target files. If a file has the target extended attributes a flag is set to true and if not, set to false. The files flagged as true go into annotatedSkimFilesList to be used in the next repeat loop.
  • Embeds in place the annotations made to the files in Skim.
    • Using the skimpdf utility within Skim on the files in annotatedSkimFilesList, annotations are embedded in place. Thus no need to export to a second file, then delete the original and replace it.

NOTE: While I have tested this and it works without issue for me, nonetheless do not run this until you are sure you have a proper backup! You should also test the workflow on a small sampling of copied files placed outside of the actual search folder the workflow will be run on after testing is done.


Example AppleScript code:

on run {input, parameters}

    set skimpdfPathFilename to "'/Applications/Skim.app/Contents/SharedSupport/skimpdf'"

    set offsetInSeconds to 60       
    set modifiedFilesList to {}
    set annotatedSkimFilesList to {}

    repeat with i from 1 to count input
        set fileInfo to info for item i of input
        set cDate to creation date in fileInfo
        set mDate to modification date in fileInfo
        if mDate > (cDate + offsetInSeconds) then
            set end of modifiedFilesList to POSIX path of item i of input
        end if
    end repeat

    repeat with i from 1 to count modifiedFilesList
        set withNotes to (do shell script "xattr " & quoted form of item i in modifiedFilesList ¬
            & " | [ $(grep -c \".*_notes$\") -ge 1 ] && printf 'true' || printf 'false'") as boolean
        if withNotes then
            set end of annotatedSkimFilesList to item i in modifiedFilesList
        end if
    end repeat

    repeat with i from 1 to count annotatedSkimFilesList
        do shell script skimpdfPathFilename & space & "embed" & space & ¬
            quoted form of item i in annotatedSkimFilesList
    end repeat

end run

Understanding the do shell script command in the second repeat loop:

When a PDF is annotated in Skim and saved, extended attributes are set on the file, e.g.:

$ xattr Filename.pdf 
com.apple.FinderInfo
net_sourceforge_skim-app_notes
net_sourceforge_skim-app_rtf_notes
net_sourceforge_skim-app_text_notes
$ 

The output is piped | to:

[ $(grep -c \".*_notes$\") -ge 1 ] && printf 'true' || printf 'false'

Which tests the output of grep counting the occurrences of the pattern and if grep finds one or more occurrences of the pattern, then the value of the withNotes variable is set to true, while being set to false otherwise.

Note that Skim does have a built-in command line utility, e.g. /Applications/Skim.app/Contents/SharedSupport/skimnotes that can be used to test if a PDF has annotations made in Skim, however because of its output this utility is better used in an shell script run in Terminal then a do shell script command, and why I used xattr and grep instead.


Note: The example AppleScript code above is just that, and does not include any error handling as may be appropriate/needed/wanted, the onus is upon the user to add any appropriate error handling for any example code presented and or code written by the oneself.