I have an applescript that returns the title from a website, the only issue is, it also contains lots of unwanted HTML (I think?). Most of the time, I can overcome this by removing the common characters using the following code.
on CharacterRemover(inputString, ReplaceChar)
set TID to AppleScript's text item delimiters
set AppleScript's text item delimiters to ReplaceChar
set pieces to text items of inputString -- break string apart at commas
set AppleScript's text item delimiters to "" -- or whatever replaces the comma
set inputString to pieces as text -- put string back together using whatever
set AppleScript's text item delimiters to TID
return inputString
end CharacterRemover
set FirstTitle to "<!-- react-text: 45 -->“<!-- /react-text --><!--
react-text: 46 -->Megan Fox<!-- /react-text --><!-- react-text: 47 --
>”<!-- /react-text -->" --the format of the returned title
set FirstTitle to CharacterRemover(FirstTitle, "-")
set FirstTitle to CharacterRemover(FirstTitle, ">")
set FirstTitle to CharacterRemover(FirstTitle, "<")
set FirstTitle to CharacterRemover(FirstTitle, "!")
set FirstTitle to CharacterRemover(FirstTitle, "/")
set FirstTitle to CharacterRemover(FirstTitle, "reacttext")
set FirstTitle to CharacterRemover(FirstTitle, ":")
set FirstTitle to CharacterRemover(FirstTitle, "”")
set FirstTitle to CharacterRemover(FirstTitle, "“")
set z to 0
repeat 10 times
set FirstTitle to CharacterRemover(FirstTitle, z)
set z to z + 1
end repeat
set FirstTitle to CharacterRemover(FirstTitle, " ")
display dialog FirstTitle
However, since this code removes the numbers, when I get titles such as
<!-- react-text: 477 -->“<!-- /react-text --><!-- react-text: 478 -->iPhone 8<!-- /react-text --><!-- react-text: 479 -->”<!-- /react-text -->
it returns as "iPhone" instead of "iPhone 8"
edit: on the website "higherorlower.com" I am using javascript "document.getElementsByClassName" to return the title of the given search amount
any ideas to overcome this?
Best Answer
I would advise you look at (and, if you wish, feedback about) the method you're using to retrieve the information from the website, as the best and most reliable option would be to use a different method such that you don't have to deal with the
ReactJS
comments at all.If you'd included that part of your AppleScript along with the rest, it might have been a chance to solve your problem at its source.
Nonetheless, here's one method of stripping the tags from your text strings, though by no means the only method, nor necessarily the most graceful or efficient. But it's reasonably clean and, presuming the tags are all simple
ReactJS
comment tags, it will do a reliable job.string1
is a copy of your variableFirstTitle
, including the line breaks that it contained (I'm not sure whether these were in their intentionally or an artefact of when you copied your script over into the browser); their presence or absence doesn't affect the efficacy of my script, but merely necessitated the two lines at the start of thestripTags
handler that gets rid of them.string2
is the text you supplied at the bottom of your question.I've shown the output of each of these following processing. I retained the double so-called "smart"-quotes that are part of the string and lie outwith the tags; I did see that you had opted to eliminate them, but their presence here—merely for demonstration purposes—are a nice visual reassurance that the script targets only the tags, and preserves the text in between. I hope you don't mind if I leave those smart-quotes for you to deal with as you wish.
Let me know if you have any queries.
ADDED 2018-05-12:
@cjeccjec Thank you for updating the website information with the correct URL. Tip for next time: include the code you're using to get the title. It'll be easier for people to help you and it will attract more help as well.
Luckily, this problem seems quite straightforward. Using
getElementsByClassName()
is a good idea, and you even managed to identify the classname of interest,term-keyword__keyword
. Well done.The elements assigned to that classname are
<p>
elements. They do have atitle
property, but it's empty, so I suspect that's not what you're using nor what you're after at all.They also have a property called
textContent
, which, as it suggests, returns the text contained within the element, i.e. the labels of the items being compared in this game. I believe that's what you're after, and it's completely free fromReactJS
tags.This code returns an array of the
textContent
properties from the three loadedp.term-keyword__keyword
tags on the site at any one time: the two currently visible and in play being compared; and one off-screen to the right waiting to scroll into view for the next comparison.I also took the liberty of slicing off the quotes from the beginning and end of the texts.
Incorporate this into AppleScript like so:
Those were the results that I got returned whilst playing the game. I was trying to guess whether "Microsoft Word" or "Moobs" had more internet searches, which I got correct; then "Malaysia" scrolled into view as I already knew it would.
Using this method, you don't need to strip any
ReactJS
tags away, nor any quote marks.