MacOS – extract the top-level domain and the second-level domain from a URL

bashcommand linemacos

I'd like to extra the top-level domain and the second-level domain from a URL like "https://apple.stackexchange.com/questions/ask"

Example URL with desired result below.

https://apple.stackexchange.com/questions/ask
   stackexchange.com


   nytimes.com

https://nextdoor.com/news_feed/?post=117602&ct=-A17-ghvVOF0tfn9vptW_5a7JOBEyP4w6_hJAZUnMQqN56952&ec=OWKiQRDj9vEHefhwfGYAE0s%3D&lc=1002&is=tpe
   nextdoor.com


   amazon.com

http://www.verizon.net/index.php
   verizon.net

I'm ignoring those multi-tier domains. I'd prefer to use Bash on macOS.

There are lots of pages on getting the full domain name:

  1. Extract domain name from URL using bash shell parameter substitution

    https://www.cyberciti.biz/faq/get-extract-domain-name-from-url-in-linux-unix-bash/

  2. echo http://example.com/index.php | awk -F[/:] '{print $4}'

    https://stackoverflow.com/a/11385736/1360075

I do not need this level of perfection.

https://github.com/john-kurkowski/tldextract

Best Answer

As you are already using awk and are looking for a simple solution:

awk -F/ '{n=split($3, a, "."); printf("%s.%s", a[n-1], a[n])}' <<< 'http://www.example.com/index.php'
      ^ ^   ^^^^^^^^^^^^^^^^^^                  ^^^^^^^^^^^^
      | |          |                                  |
      | |          |                            last two elements 
      | |          |
      | |          +--- Split the 3rd field (aka the part after //) into
      | |               the array 'a', using '.' as the separator for splitting.
      | |               Returns the number of created array elements in 'n'.
      | |
      | +-------------- The awk code between the '' gets run once for every
      |                 input line, with the fields split by -F/ stored in
      |                 $1, $2 etc. In our case $1 contains "http:", $2 is 
      |                 empty, $3 contains "www.example.com" and $4 etc. the
      |                 various path elements (if there are any)
      |
      +---------------- Split the input lines into fields, separated by '/'