Ubuntu – Extracting a specific string after a given string from HTML file using a bash script

bashcommand linetext processing

I have a HTML file momcpy.html from which I want to extract a specific string after a given string.
File content is like:

<tr><br>
<th height="12" bgcolor="#808080"><label for="<br>
 LSCRM:Abhijeet<br>
 <br>
 MCRM:Bhargav<br>
 <br>
 TLGAPI:GAURAVAURAV<br>
 <br>
 MOM:MANIKA"></td><br>

This is present on one of the lines of HTML.

I want to extract Manika and store it in a variable. So Basically I want to extract whatever string is present after MOM:, It could be dynamic.

I have tried:

file='/home/websphe/tomcat/webapps/MOM/web/momcpy.html'
  y=$( awk '$1=="MOM:"{print $2}' $file)
 echo "$y"

But that didn't work.

Best Answer

I can't sensibly advise doing this, because parsing html with regex is not likely to end well but you might be able to get the string MANIKA with

sed -nr '/MOM:/ s/.*MOM:([^"]+).*/\1/p' file

It works OK on your sample anyway...

Notes

-n don't print anything until we ask for it
-r use ERE
/string/ find lines with string
s/old/new/ replace old with new
.* any number of any characters
([^"]+) save some characters that are not "
\1 backreference to saved characters
p print just the lines we changed

Related Solutions

Ubuntu – Replace string in bash script

-i can only be used with sed if you're passing a file, it means "inline replace". Without this, the output of sed would be written to stdout (usually the console output). With -i, it does an inline replacement, that is, doing replacements in the file itself.

The next code reads the contents of jasperreports.properties into the variable $input (line 1) and finds the string to be replaced (line 2).
On the third line, it outputs the input string and pipes it through sed for replacement. sed outputs the string to stdout which will be caught by $( and ), and therefore be stored in $input.

read input < jasperreports.properties
find=$(grep "$jasper" jasperreports.properties | awk -F"reports/" '{print $2}')
input=$(echo "$input" | sed "s/$find/charts/")

If you want to apply the changes immediately to the file:

find=$(grep "$jasper" jasperreports.properties | awk -F"reports/" '{print $2}')
sed "s/$find/charts/" -i jasperreports.properties

From man sed:

   s/regexp/replacement/
          Attempt   to   match  regexp  against  the  pattern  space.   If
          successful, replace that portion matched with replacement.   The
          replacement may contain the special character & to refer to that
          portion of the pattern space  which  matched,  and  the  special
          escapes  \1  through  \9  to refer to the corresponding matching
          sub-expressions in the regexp.

Ubuntu – replace a string by variable in a file using bash script

AWK can search and replace text as well, so there is no need to use grep or sed. The code bellow extracts substring from second column (webN), increments N, and substitutes second field with webN+1

$ cat testInput.txt                                                                                          
project web0
other
project web1
$ awk '/web/{ num=substr($2,4)+1;$2="web"num };1' testInput.txt                                              
project web1
other
project web2

This will print edited file on screen. You can save that to another file like so awk [rest of code here] > fileName.txt and replace original with new using mv fileName.txt oldFile.txt

Best Answer

Notes

Related Solutions

Ubuntu – Replace string in bash script

Ubuntu – replace a string by variable in a file using bash script

Related Question