Convert HTML hyperlink code to Markdown (MD) in pure AppleScript

applescriptcommand linehtmlmarkdowntext;

I have a text file or some text to process via the clipboard. It contains a few lines of HTML code, and I'd like to convert it to pure Markdown.

E.g. from:

This is a link: <a href="https://duck.com">link</a>

to:

This is a link: [link](https://duck.com)

Is there pure AppleScript code (using regex substitution) that can do this?

Alternatively, I am happy with AppleScript calling any CLI tool to process the text via do shell script if it's an in-built text which doesn't require homebrew or a third party tool to be installed.

Best Answer

Here is one example:

set htmlString to "This is a link: <a href=\"https://duck.com\">link</a>"

set mdString to do shell script "/usr/bin/sed -E -e 's|<a href=\"|[link](|g' -e 's|\">link</a>|)|g' <<< " & htmlString's quoted form

Result:

"This is a link: [link](https://duck.com)"

This can also be down without the use of the do shell script command, as in this example:

set htmlString to "This is a link: <a href=\"https://duck.com\">link</a>"

set htmlString to findAndReplaceInText(htmlString, "<a href=\"", "[link](")
set htmlString to findAndReplaceInText(htmlString, "\">link</a>", ")")

on findAndReplaceInText(theText, theSearchString, theReplacementString)
    set AppleScript's text item delimiters to theSearchString
    set theTextItems to every text item of theText
    set AppleScript's text item delimiters to theReplacementString
    set theText to theTextItems as string
    set AppleScript's text item delimiters to ""
    return theText
end findAndReplaceInText

Result:

"This is a link: [link](https://duck.com)"

If This is a link: <a href="https://duck.com">link</a> is in a file or on the clipboard, the escaping is done automatically when assigning it to a variable. You then only need to escape the " in the sed command as shown in the example above.


Other examples:

If This is a link: <a href="https://duck.com">link</a> in in a file:

set htmlFile to "/path/to/filename/ext"
set htmlString to read htmlFile
set mdString to do shell script "/usr/bin/sed -E -e 's|<a href=\"|[link](|g' -e 's|\">link</a>|)|g' <<< " & htmlString's quoted form

Or, processing the file directly:

set htmlFile to "/path/to/filename.ext"
set mdString to do shell script "/usr/bin/sed -E -e 's|<a href=\"|[link](|g' -e 's|\">link</a>|)|g'" & space & htmlFile's quoted form

If This is a link: <a href="https://duck.com">link</a> is on the clipboard:

set htmlString to (the clipboard as text)
set mdString to do shell script "/usr/bin/sed -E -e 's|<a href=\"|[link](|g' -e 's|\">link</a>|)|g' <<< " & htmlString's quoted form

Note: The use of the findAndReplaceInText() handler can also be used in place of the do shell script command in these other examples.