Automator to get PDF Annotations and export to Excel

applescriptautomatorms officepdf

I'm trying to make a macOS service using Automator to extract PDF annotations and get the information into an Excel file.

I can successfully get the annotations from a PDF, and export them to a text file. But if I try to parse the same information to a new Excel file, all the information is pasted into a single cell.

What would be the correct steps in Automator, to have each text line in separated Excel rows?

Best Answer

Assuming you know the basics of Automator....Create a workflow with these Actions:

enter image description here

In the AppleScript Action copy & paste the following:

    on run {input, parameters}
    set delimitedList to paragraphs of (input as string)
    set myExport to ""
    do shell script "touch /tmp/myFile.csv"
    repeat with myLines in delimitedList
        set myLineExport to ""
        set AppleScript's text item delimiters to {"    "}
        set listItems to every text item of myLines
        repeat with eachItem in listItems
            set myLineExport to myLineExport & "\"" & eachItem & "\","
        end repeat
        set myExport to myExport & myLineExport & "
    end repeat
    write_to_file(myExport, (POSIX file "/tmp/myFile.csv" as alias), false)
    return POSIX file "/tmp/myFile.csv" as alias
end run

on write_to_file(this_data, target_file, append_data)
        set the target_file to the target_file as string
        set the open_target_file to open for access file target_file with write permission
        if append_data is false then set eof of the open_target_file to 0
        write this_data to the open_target_file starting at eof
        close access the open_target_file
        return true
    on error
            close access file target_file
        end try
        return false
    end try
end write_to_file

Note: You may run into issues if your annotations contain quotes, however there will be no problem if you use smart quotes.

You can select your pdf in the first prompt.
