MacOS – AppleScript: How to modify HTML and RTF data on the clipboard

applescriptcopy/pastehtmlmacostext;

I am writing an AppleScript .scpt file, triggered system-wide by key combination assigned in FastScripts.app, that adds parentheses around the selected editable text.

If the selected text happens to already be wrapped in parentheses, then I want my script to effectively delete the parentheses from the selection. This is where I need assistance. I do not want to strip any of the formatting from formatted text.

My script works if the selection-with-parentheses is plain text data, but not if it is RTF or HTML data.

Here is my full code:

use AppleScript version "2.4"
use scripting additions
use framework "Foundation"
use framework "AppKit"

(*
Get the selected text into an AppleScript, while preserving the original clipboard:
From: http://apple.stackexchange.com/questions/271161/how-to-get-the-selected-text-into-an-applescript-without-copying-the-text-to-th/
*)

-- Back up the original clipboard contents:
set savedClipboard to my fetchStorableClipboard()
set thePasteboard to current application's NSPasteboard's generalPasteboard()
set theCount to thePasteboard's changeCount()

-- Copy selected text to clipboard:
tell application "System Events" to keystroke "c" using {command down}

-- Check for changed clipboard:
repeat 20 times
    if thePasteboard's changeCount() is not theCount then exit repeat
    delay 0.1
end repeat

set firstCharacter to (character 1 of (the clipboard))
set lastCharacter to (character (length of (the clipboard))) of (the clipboard)

-- Remove the parentheses from the selection, if the selection is wrapped in parentheses:
if (firstCharacter is "(") and (lastCharacter is ")") then
    -- The selection already has parentheses.
    -- I must discern what class types are available for the clipboard content:
    tell current application
        set cbInfo to get (clipboard info) as string
        if cbInfo contains "RTF" then
            -- I need help here.
            -- Remove the first and last characters of the rich text, while retaining the rich text formatting.
        else if cbInfo contains "HTML" then
            -- I need help here.
            -- Remove the first and last characters of the HTML, while retaining formatting data.
        else
            -- The clipboard contains plain text.

            -- Remove the first and last character of a plain text string:
            set theSelectionWithoutParentheses to (text 2 thru -2 of (the clipboard))
            set the clipboard to theSelectionWithoutParentheses
            tell application "System Events" to keystroke "v" using {command down}
        end if
    end tell

else
    -- The selection needs parentheses.
    tell application "System Events" to keystroke "("
    delay 0.1
    tell application "System Events" to keystroke "v" using {command down}
    delay 0.1
    tell application "System Events" to keystroke ")"
end if

delay 1 -- Without this delay, may restore clipboard before pasting.

-- Restore clipboard:
my putOnClipboard:savedClipboard

on fetchStorableClipboard()
    set aMutableArray to current application's NSMutableArray's array() -- used to store contents
    -- get the pasteboard and then its pasteboard items
    set thePasteboard to current application's NSPasteboard's generalPasteboard()
    -- loop through pasteboard items
    repeat with anItem in thePasteboard's pasteboardItems()
        -- make a new pasteboard item to store existing item's stuff
        set newPBItem to current application's NSPasteboardItem's alloc()'s init()
        -- get the types of data stored on the pasteboard item
        set theTypes to anItem's types()
        -- for each type, get the corresponding data and store it all in the new pasteboard item
        repeat with aType in theTypes
            set theData to (anItem's dataForType:aType)'s mutableCopy()
            if theData is not missing value then
                (newPBItem's setData:theData forType:aType)
            end if
        end repeat
        -- add new pasteboard item to array
        (aMutableArray's addObject:newPBItem)
    end repeat
    return aMutableArray
end fetchStorableClipboard

on putOnClipboard:theArray
    -- get pasteboard
    set thePasteboard to current application's NSPasteboard's generalPasteboard()
    -- clear it, then write new contents
    thePasteboard's clearContents()
    thePasteboard's writeObjects:theArray
end putOnClipboard:

Best Answer

With some help from

I have devised a solution to modify HTML or RTF clipboard content, and then put this modified content on the clipboard.


While we're at it, here's another method to remove outer parentheses from plain text clipboard data:

do shell script "pbpaste | tr -d '()' | pbcopy"

To remove outer parentheses from HTML clipboard data (while preserving the formatting data):

do shell script "osascript -e 'try' -e 'get the clipboard as «class HTML»' -e 'end try' | awk '{sub(/«data HTML/, \"3C68746D6C3E\") sub(/»/, \"3C2F68746D6C3E\")} {print}' | xxd -r -p | textutil -convert rtf -stdin -stdout | tr -d '()' | pbcopy"

Both Method 1 and Method 2 to remove outer parentheses from RTF clipboard data (while preserving the formatting data), manipulate the RTF clipboard content by parsing this data as hex code:

Method 1:

try
    -- Get the RTF clipboard data in hexadecimal form:
    set theOriginalHexData to do shell script "osascript -e 'try' -e 'get the clipboard as «class RTF »' -e 'end try'"
on error eStr number eNum
    display dialog eStr & " number " & eNum buttons {"OK"} default button 1 with icon caution
    error number -128 (* user cancelled *)
end try

-- I don't want any parentheses inside of the outer parentheses to be removed; they must be preserved. So...

-- Check to see if there is more than 1 instance of "(":    
set originalDelimiters to AppleScript's text item delimiters
set AppleScript's text item delimiters to "28" -- The hex code for the character: "("
set theHexInAListSeparatedByOpeningParentheses to text items of theOriginalHexData
set numberOfOpeningParentheses to ((count theHexInAListSeparatedByOpeningParentheses) - 1)
set AppleScript's text item delimiters to originalDelimiters

if numberOfOpeningParentheses is 1 then
-- There are zero inner opening-parentheses.
    set theModifiedHexData to (item 1 of theHexInAListSeparatedByOpeningParentheses) & (item 2 of theHexInAListSeparatedByOpeningParentheses)

else if numberOfOpeningParentheses is greater than 1 then
-- There is at least one inner opening-parenthesis.
    set theModifiedHexData to (item 1 of theHexInAListSeparatedByOpeningParentheses) & (item 2 of theHexInAListSeparatedByOpeningParentheses)
    set counter to 2
    repeat until (counter is greater than numberOfOpeningParentheses)
        -- Add the desired inner opening-parentheses back into the string:
        set theModifiedHexData to (theModifiedHexData & "28" & (item (counter + 1) of theHexInAListSeparatedByOpeningParentheses))
        set counter to counter + 1
    end repeat
end if


-- Check to see if there is more than 1 instance of ")":
set originalDelimiters to AppleScript's text item delimiters
set AppleScript's text item delimiters to "29" -- The hex code for the character: ")"
set theHexInAListSeparatedByClosingParentheses to text items of theModifiedHexData
set numberOfClosingParentheses to ((count theHexInAListSeparatedByClosingParentheses) - 1)
set AppleScript's text item delimiters to originalDelimiters

if numberOfClosingParentheses is 1 then
-- There are zero inner closing-parentheses.
    set theModifiedHexData to (item 1 of theHexInAListSeparatedByClosingParentheses) & (item 2 of theHexInAListSeparatedByClosingParentheses)

else if numberOfClosingParentheses is greater than 1 then
-- There is at least one inner closing-parenthesis.
    set theModifiedHexData to (item 1 of theHexInAListSeparatedByClosingParentheses)
    set counter to 2
    repeat until ((counter) is greater than numberOfClosingParentheses)
        -- Add the desired inner closing-parentheses back into the string:
        set theModifiedHexData to (theModifiedHexData & "29" & (item (counter) of theHexInAListSeparatedByClosingParentheses))
        set counter to counter + 1
    end repeat
    set theModifiedHexData to (theModifiedHexData & (item (counter) of theHexInAListSeparatedByClosingParentheses))
end if

-- Put the modified hex code onto the clipboard, as RTF data:
try
    -- Get the RTF clipboard data in hexadecimal form:
    do shell script "osascript -e 'set the clipboard to " & theModifiedHexData & "'"
on error eStr number eNum
    display dialog eStr & " number " & eNum buttons {"OK"} default button 1 with icon caution
    error number -128 (* user cancelled *)
end try

Method 1 for modifying RTF clipboard data is conceivably susceptible to false positives, if the desired hex code is split between the second character of one hex pair and the first character of the following hex pair.

For example, 32 and 85, when side-by-side in the hex code, is also interpreted as 28, in the eyes of Method 1. Clearly, this is undesirable.

Method 2:

Method 2 solves the issue, making a false positive like this impossible. This is because, before Method 2 analyzes the hex code, it first organizes the hex code text object into a binary-based list object.

Unlike Method 1, Method 2 interprets the hex code in pairs. Therefore, Method 2 is technically better:

global hexCodeInBinaryList

set firstCharacterHEX to "28"
set lastCharacterHEX to "29"


try
    -- Get the RTF clipboard data in hexadecimal form:
    set theOriginalHexData to do shell script "osascript -e 'try' -e 'get the clipboard as «class RTF »' -e 'end try'"
on error eStr number eNum
    display dialog eStr & " number " & eNum buttons {"OK"} default button 1 with icon caution
    error number -128 (* user cancelled *)
end try


-- I don't want any parentheses inside of the outer parentheses to be removed; they must be preserved. So...

-- Make sure that any parentheses that come after the first parenthesis is preseved:
putStringIntoBinaryList(theOriginalHexData)
set counter to 1
set listContainingItemNumbers to {}

repeat until (counter is greater than (count of hexCodeInBinaryList))
    if ((item counter) of hexCodeInBinaryList) is firstCharacterHEX then
        set listContainingItemNumbers to listContainingItemNumbers & counter
    end if
    set counter to counter + 1
end repeat
set numberOfOpeningParentheses to (count of listContainingItemNumbers)

set theNewLocation to item 1 of listContainingItemNumbers
set theModifiedHexData to ((items 1 thru (theNewLocation - 1) of hexCodeInBinaryList) as text)
set theModifiedHexData to theModifiedHexData & ((items (theNewLocation + 1) thru (count of hexCodeInBinaryList) of hexCodeInBinaryList) as text)


-- Make sure that any parentheses that come before the last parenthesis is preseved:
putStringIntoBinaryList(theModifiedHexData)

set counter to 1
set listContainingItemNumbers to {}

repeat until (counter is greater than (count of hexCodeInBinaryList))
    if ((item counter) of hexCodeInBinaryList) is lastCharacterHEX then
        set listContainingItemNumbers to listContainingItemNumbers & counter
    end if
    set counter to counter + 1
end repeat
set numberOfClosingParentheses to (count of listContainingItemNumbers)

set theNewLocation to (item numberOfClosingParentheses of listContainingItemNumbers)
set theModifiedHexData to ((items 1 thru (theNewLocation - 1) of hexCodeInBinaryList) as text)
set theModifiedHexData to theModifiedHexData & ((items (theNewLocation + 1) thru (length of hexCodeInBinaryList) of hexCodeInBinaryList) as text)



try
    -- Put the modified hex code onto the clipboard, as RTF data:
    do shell script "osascript -e 'set the clipboard to " & theModifiedHexData & "'"
on error eStr number eNum
    display dialog eStr & " number " & eNum buttons {"OK"} default button 1 with icon caution
    error number -128 (* user cancelled *)
end try




on putStringIntoBinaryList(theStringToConvertToList)

    set totalNumberOfChracters to length of theStringToConvertToList

    set startPoint to 1
    set endPoint to 2
    set hexCodeInBinaryList to {}

    repeat until (endPoint is totalNumberOfChracters) or ((endPoint - 1) is totalNumberOfChracters)
        set hexCodeInBinaryList to hexCodeInBinaryList & (text startPoint thru endPoint of (the theStringToConvertToList))

        set startPoint to (startPoint + 2)
        set endPoint to (endPoint + 2)
    end repeat

    if ((endPoint - 1) is totalNumberOfChracters) then
        set hexCodeInBinaryList to hexCodeInBinaryList & (character (endPoint - 1) of (the theStringToConvertToList))
    end if

end putStringIntoBinaryList

However, note that Method 2 has been designed around the assumption that you are working with a two-character hex code delimiter.

If your hex code delimiter happens to exceed 2 characters, (such as the 8-character hex code delimiter for an opening double quotation mark , which is 5C273933), you would have to rewrite the putStringIntoBinaryList() subroutine accordingly (or use Method 1 instead, which is probably safe to use on a lengthy, 8-character hex code delimiter).