Linux – How to Use XMLSTARLET to Read Value from Large XML File

linuxmacosxml

I have a very large XML file and I simply need to read values from it. So far, I have not been successful with XMLSTARLET. I use the "sel" and then try to provide a path to the item, but no luck. I have no idea what extra characters or fields to use. The entire string? Brackets? There's also a tool called "xml_grep", which I assume I would use paths with unix-like brackets to get the values.

Any ideas?

Best Answer

I suppose you want to the values of certain elements in that XML-file, and you already know how to specify those with Xpath, e.g. //employee[@retired="no"]:

Then to get the value of that single field

xmlstarlet sel -t -v '//employee[@retired="no"]/name'  thefile.xml

Or when you want for example 2 elements from each employee, separated by a pipe char:

xmlstarlet sel -t -m '//employee[@retired="no"]' -v name -o "|" -v "age" -nl file.xml

Basicially you specify a template (-t), followed by where the template needs to by applied to (-m followed by the Xpath where to match), then followed by the items you want to extract from that (-v for value of).

The program 'xml_grep' on the other hand is part of XML::Twig. The idea of XML::Twig is that it does not read the whole xml into memory, but work on the file twig by twig. And that has as result that you cannot specify certain Xpaths (e.g. twigs referring to sibling elements). When the XPaths are simple enough to be specified only on a twig-by-twig basis, then this program allows indeed for very large files to be processed, while using only a limited amount of memory.

You did not gave enough information of the structure of the XML-file or the type of XPaths that you want the values of to be able to help here.

And, of course, because XML is just a text-format, depending on the format of the file, and the complexity of the questions, maybe even other plain text programs can work too:

grep -o '<name>[^<]*</name>' file.xml
Related Question