Ubuntu – Extracting a specific string after a given string from HTML file using a bash script

bashcommand linetext processing

I have a HTML file momcpy.html from which I want to extract a specific string after a given string.
File content is like:

<tr><br>
<th height="12" bgcolor="#808080"><label for="<br>
 LSCRM:Abhijeet<br>
 <br>
 MCRM:Bhargav<br>
 <br>
 TLGAPI:GAURAVAURAV<br>
 <br>
 MOM:MANIKA"></td><br>

This is present on one of the lines of HTML.

I want to extract Manika and store it in a variable. So Basically I want to extract whatever string is present after MOM:, It could be dynamic.

I have tried:

file='/home/websphe/tomcat/webapps/MOM/web/momcpy.html'
  y=$( awk '$1=="MOM:"{print $2}' $file)
 echo "$y"

But that didn't work.

Best Answer

I can't sensibly advise doing this, because parsing html with regex is not likely to end well but you might be able to get the string MANIKA with

sed -nr '/MOM:/ s/.*MOM:([^"]+).*/\1/p' file

It works OK on your sample anyway...

Notes

  • -n don't print anything until we ask for it
  • -r use ERE
  • /string/ find lines with string
  • s/old/new/ replace old with new
  • .* any number of any characters
  • ([^"]+) save some characters that are not "
  • \1 backreference to saved characters
  • p print just the lines we changed