I have to following sample output:
<HARDWARE>
<NAME>WIN1</NAME>
<OS>Windows 7</OS>
<IP>1.2.3.4</IP>
<DOMAIN>contoso.com</DOMAIN>
</HARDWARE>
<HARDWARE>
<NAME>WIN2</NAME>
<OS>Windows 8</OS>
<IP>10.20.30.40</IP>
<DOMAIN>contoso.com</DOMAIN>
</HARDWARE>
What is the best way to parse it so it will look like:
WIN1 Windows 7 1.2.3.4 contoso.com
WIN2 Windows 8 10.20.30.40 contoso.com
Looking for a solution to use standard tools like awk, sed etc
Best Answer
Please don't use
awk
sed
etc. They cannot handleXML
properly.XML
does a bunch of stuff like having whitespace, linefeeds, unary tags etc. that means regular expressions aren't very robust - they break messily, following a perfectly valid change to XML down the line.The way to handle
XML
is with a parser.xmlstarlet
is one commonly used on Linux. Because I haven't seen it suggested yet- I'd use perl. E.g.:HARDWARE
elements.text
from the childrenYou could extend it a little to allow you to handle e.g. different field sets/ordering:
It generates a hash (associative array) called
%fields
that looks like (for each element):And then we use
@fields_to_show
to specify which to display and in which order.So this will thus print:
NB: I also has to 'fix' your XML, because without a single root tag it's invalid. Other answers have mentioned this. The
XML
spec is quite strict - brokenXML
should be rejected. So it's actually quite bad form to "fix" XML and normally I'd suggest hitting whoever generated it around the head with a copy of the XML spec.