Parsing XML, JSON, and newer data file formats in UNIX using command line utilities

text processingxml

The Unix environment has some excellent tools for parsing text in various forms. However, of late, the data is not in the traditional (historical) formats (CSV, TSV, record-based or some other delimiter-based) it used to be before. Data these days is exchanged in structured formats like XML/JSON.

I know there are some good tools like sed, awk and Perl which can chew down nearly any form of data out there. However, to work with this sort of structured data, often one has to write a complete program, and, given the little time available to extract information, one has to sit down and figure out the whole logic of what one wants to query and put it down programmatically. Sometimes this is not OK – basically because the information extracted from those files acts as inputs for further work; also because of the time it takes to search for the appropriate solution and code it up. A command line tool is needed with sufficient switches to find, query and dump data.

I'm looking for tools that take a XML/JSON or other forms of structured data and dump it into other formats like csv, etc., so that from there one could use other commands to get any information out of it.

Are there any command line utilities you know of which do this kind of a job? Are there already awk/Perl scripts available to this?

Best Answer

for xml there is http://xmlstar.sourceforge.net/

XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands.

you can also use xsltproc and similar tools (saxon).

for json: i also think its better to just use python, ruby, perl and transform it.

Related Question