Print Text Between Two XML Tags – Using sed

sedxml

Consider the following excerpt from a large XML file:

  ...
  <serverName someKey="false" anotherKey="0.05" thirdKey="0.04">
    <default>blah.blah.blah</default>
    <region name="US">us.blah.net</region>
    <region name="EU">eu.blah.net</region>
    <region name="IL">il.blah.net</region>
  </serverName>
  ...

How do I print the the lines between the opening tag <serverName ...> and the closing tag </serverName>?

Best Answer

sed is a great tool but XML will eventually make any programmer who approaches it with a REGEX cry. I know. I've been there. If there is even the smallest chance that your data will change, you want a proper XML parser.

My choice would be to use BeautifulSoup but it makes handling it directly from Bash fairly hard. If you want to write an intermediary Python script, that's still an option... Otherwise xpath is a fairly classic option. It's a wrapper around Perl's libxml library and it does some fairly powerful things.

sudo apt-get install libxml-xpath-perl

And for your example, here's how I'd do this in the xpath query language:

xpath -e '*/serverName/*' big_xml_file.xml

Again, if you need to do anything useful with this XML, consider something even stronger like BeautifulSoup and Python.