MacOS – applescript CURL with authentication fails

applescriptmacos

QUESTION: i need a working method to be used in applescript to get the right source of the page, WITHOUT loading the page.

sample link: https://www.idealista.it/immobile/16679597/

result: wrong html, it talks about authentication.

INITIAL CODE (always present for all the tries below):

set MyUser to username@dom.com
set MyPass to password
set UrlOfPage to "https://www.idealista.it/immobile/16679597/"

TRIES (all of the tries below are given in this webpage https://ec.haxx.se/http-auth.html):

  • works but need the page to be loaded in Safari

    tell front document of application "Safari" to set StrHtml to (get source) as string
    
  • returns wrong html

    set StrHtml to (do shell script "curl --user " & MyUser & ":" & MyPass & " " & UrlOfPage)
    
    set StrHtml to (do shell script "curl --anyauth --user " & MyUser & ":" & MyPass & " " & UrlOfPage)
    
    set StrHtml to (do shell script "curl --digest --user " & MyUser & ":" & MyPass & " " & cellurl)
    
    set StrHtml to (do shell script "curl --negotiate --user " & MyUser & ":" & MyPass & " " & cellurl)
    
    set StrHtml to (do shell script "curl --ntlm --user " & MyUser & ":" & MyPass & " " & cellurl)
    
  • doesn't work: unknown token

    set StrHtml to (do shell script "curl --proxy-anyauth --proxy-user " & MyUser & ":" & MyPass & " https://www.idealista.it/immobile/16679597/ \ --proxy https://proxy.idealista.it/immobile/16679597:80/")
    

could somebody help me now please?

Best Answer

Dedicated Tools

Given the problems encountered with curl and AppleScript, consider using an alternative dedicated tool such as Beautiful Soup. See How To Scrape Web Pages with Beautiful Soup and Python 3 for a comprehensive introduction.

Alternatively, there are numerous tools other that can help, see Web scraping software on Wikipedia. Many of these tools are free, open-source, and can be called from the command line.

I have previously used Web::Scraper for extracting property listings.