Convert bulleted list HTML code to Markdown in AppleScript

applescriptcommand linehtmlmarkdowntext;

I have this text to manipulate in AppleScript (e.g. the text of a variable):

Example note exported from Apple.

<ul>
  <li>Indent</li>
  <ul>
    <li>*Further* indent</li>
    <ul>
      <li>Even **further **indent. With a [link](https://duck.com).</li>
    </ul>
  </ul>
</ul>

End note.

I'm converting it all to Markdown from HTML. I need to clean up this remaining bit of HTML which is the bullet list, so that the result is (with real tabs as indent space):

Example note exported from Apple.

- Indent
    - *Further* indent
        - Even **further **indent. With a [link](https://duck.com).

End note.

It has to be able to cater to nested indenting to any 'n' number of levels, and with possibly some rich text inside the items such as this example. I prefer Markdown output to use hyphens and a tab for the indenting.

It also has to be containable inside the applescript – no external .py files etc, and to not require homebrew or a third party tool to be installed.

Best Answer

The following example AppleScript code, as coded, is intended to work with bulleted list HTML code as shown in the OP, meaning that what is passed in the variable is just the relevant code to define the bulleted list and not other arbitrary HTML code.

As coded, it will produce the appropriate output for variations of bulleted list HTML code as well, not just the specific example shown herein. This has been tested on a variety of other samples containing just bulleted list HTML code and produces the relevant output for it as does the example herein.

--  # Define exportedNote variable containing the bulleted list HTML code.

set exportedNote to "<ul>
  <li>Indent</li>
  <ul>
    <li>*Further* indent</li>
    <ul>
      <li>Even **further** indent. With a [link](https://duck.com).</li>
    </ul>
  </ul>
</ul>"

--  # Create an AppleScript list from the lines of bulleted list HTML code.

set exportedNoteList to paragraphs of exportedNote

--  # Process the list, acting only on items that contain "</li>" 
--  # as they are the only ones relevant to converting the 
--  # bulleted list HTML code to Markdown.

set tempList to {}
repeat with i from 1 to the number of items in exportedNoteList
    if item i of exportedNoteList contains "<li>" then
        set thisItem to item i of exportedNoteList
        set thisItem to findAndReplaceInText(thisItem, "</li>", "")
        set numberOfLeadingSpaces to ((offset of "<" in thisItem) - 1)
        if numberOfLeadingSpaces is less than 4 then
            set searchString to characters 1 thru numberOfLeadingSpaces of thisItem & "<li>" as text
            set thisItem to findAndReplaceInText(thisItem, searchString, "- ")
            set end of tempList to thisItem
        end if
        if numberOfLeadingSpaces is greater than 3 then
            set searchString to characters 1 thru numberOfLeadingSpaces of thisItem & "<li>" as text
            set thisItem to findAndReplaceInText(thisItem, searchString, "- ")
            set numberOfLeadingTabs to (numberOfLeadingSpaces / 2) as integer
            repeat with i from 1 to (numberOfLeadingTabs - 1)
                set thisItem to tab & thisItem
            end repeat
            set end of tempList to thisItem
        end if
    end if
end repeat

--  # Update the contents of the exportedNoteList
--  # list to contain only the relevant list items.

set exportedNoteList to tempList

--  # Convert the exportedNoteList list to text.

set AppleScript's text item delimiters to linefeed
set convertedNote to text items of exportedNoteList
set convertedNote to convertedNote as text
set AppleScript's text item delimiters to {}

--  # 'return' is only used to show its value in Script Editor. Use
--  # the convertedNote variable as needed in the working code.

return convertedNote


--  # Handler(s)

--  # Note: Handlers may be placed as one chooses as appropriate.
--  # My preference is at the bottom of the rest of the code.

on findAndReplaceInText(theText, theSearchString, theReplacementString)
    set AppleScript's text item delimiters to theSearchString
    set theTextItems to every text item of theText
    set AppleScript's text item delimiters to theReplacementString
    set theText to theTextItems as string
    set AppleScript's text item delimiters to ""
    return theText
end findAndReplaceInText

Result:

"- Indent
    - *Further* indent
        - Even **further** indent. With a [link](https://duck.com)."

Which displays as the following in a browser on this web site:

  • Indent
    • Further indent
      • Even further indent. With a link.