Ubuntu – Extracting text from file and output the line into a file

bashcommand line

After using grep on an html file, I get the following output:

      <div id="v3060000-3062005" class="BLAH...>
      <div id="v50001027-50002018" class="BLAH...>
      <div id="v907200-907202" class="BLAH...>
      <div id="v20024011-20024012" class="BLAH...>

I need to extract the strings of numbers from the lines above and combine them into a URL such as:

http://x.y.z/3060000-3062005,50001027-50002018,907200-907202,20024011-20024012.mp3

May I know how I can do this using a shell script?

Best Answer

Normally, I would advise that you use a proper HTML parser to parse HTML.

However, this data looks pretty straightforward: using a double quote (optionally followed by "v") as the field separator, grab the 2nd field of each line. Then join the pieces with commas

result=$( grep ... file.html | awk -F'"v?' '{print $2}' | paste -sd, )
echo "http://x.y.z/$result.mp3"