Search replace in XML file with sed or awk

awkregular expressionsedtext processingxml

So I have a task where by I have to manipulate an XML file through a bash shell script.

Here are the steps:

  1. Query XML file for a value.
  2. Take the value and cross reference it to find a new value from a list.
  3. Replace the value of a different element with the new value.

Here is a sample of the XML with non-essential info removed:

<fmreq:fileManagementRequestDetail xmlns:fmreq="http://foobar.com/filemanagement">
      <fmreq:property>
         <fmreq:name>form_category_cd</fmreq:name>
         <fmreq:value>Memos</fmreq:value>
      </fmreq:property>
      <fmreq:property>
         <fmreq:name>object_name</fmreq:name>
         <fmreq:value>Correspondence</fmreq:value>
      </fmreq:property>
</fmreq:fileManagementRequestDetail>

I have to get the value from the value element under object_name, cross reference it, and then replace the value under the form_category_cd value element with the new value:

So if object_name -> value is Correspondence then the form_category_cd -> value might need to be YYZ.

Here's the rub, I can only use the tools available on our server as our operations group is restricting us to the tools at hand. It was a fight to get xmllint updated and then it got overruled. I'm on a version that does not support –xpath, which believe me is difficult on a good day. Also the version I have available doesn't support namespaces, so xmllint is out.

I've tried sed, but it seems to not like my regex even though every tester I try works fine.

Regex:

(<fmreq\:name>object_name<\/fmreq\:name>)(?:\n\s*)(<fmreq\:value>)(.*)(<\/fmreq\:value>)

I need to get group #3, but sed won't return it. Instead it returns the entire contents of the XML file.

sed -e 's/\(<fmreq\:name>object_name<\/fmreq\:name>\)\(?:\n\s*\)\(<fmreq\:value>\)\(.*\)\(<\/fmreq\:value>\)/\3/' < c3.xml 

I'm not as familiar with awk / gawk, so I'm struggling to figure them out and this as well, but am open to them if a solution can be found.

Would love to have an awk / gawk solution just to make the boss happy since he's an old awk fan, but I'll take what I can get as I'm stumped.

Again I have to use the tools on hand and can't install anything new.

Best Answer

I think that there are a couple of problems in your sed command:

  • You don't use the -n option, so by default sed just prints every line of input to the output (possibly modified by a sed command).

  • You don't need the redirection < c3.xml, because sed recognizes the last argument as a filename.

  • sed is not very well suited for matches over multiple lines. See for example here.

The following seems to work on your example:

sed -n "/<fmreq:name>object_name<\/fmreq:name>/ {n;p}" c3.xml | sed "s/^\s*<fmreq:value>\(.*\)<\/fmreq:value>/\1/g"

Or, with only one sed invocation:

sed -n "/<fmreq:name>object_name<\/fmreq\:name>/ {n;s/^\s*<fmreq:value>\(.*\)<\/fmreq:value>/\1/g;p}" c3.xml

Breakdown of what this command does:

  • The option -n tells sed not to print the pattern space after it's finished processing the line. Consequently, you need to use the command p explicitely to do so.

  • /regex/ tells sed to execute the commands that follow only on the lines that match regex.

  • The sed command n replaces the content of the pattern space by the next line of input, which is the one containing the value you are interested in.

  • The sed command s/regex/replacement/ substitutes the first match of regex in the pattern space by replacement.

  • The sed command p prints the line.

Related Question