Convert JSON to CSV – Using JQ for JSON Array Conversion

csvjqjson

I'm looking at solutions to convert JSON to a CSV. It seems most of the solutions expect the JSON to be a single object rather than an array of objects.

All the solutions I've tried from here seem to break with my input, which comes from curling this site.

How can I convert the JSON into a CSV with jq or another tool when the input is an array instead of an object.

[
  {
    "id": "4",
    "link": "https://pressbooks.online.ucf.edu/amnatgov/",
    "metadata": {
      "@context": "http://schema.org",
      "@type": "Book",
      "name": "American Government",
      "inLanguage": "en",
      "copyrightYear": "2016",
      "disambiguatingDescription": "The content of this textbook has been developed and arranged to provide a logical progression from the fundamental principles of institutional design at the founding, to avenues of political participation, to thorough coverage of the political structures that constitute American government. The book builds upon what students have already learned and emphasizes connections between topics as well as between theory and applications. The goal of each section is to enable students not just to recognize concepts, but to work with them in ways that will be useful in later courses, future careers, and as engaged citizens. ",
      "image": "https://pressbooks.online.ucf.edu/app/uploads/sites/4/2020/01/American-Government.png",
      "isBasedOn": "https://ucf-dev.pb.unizin.org/pos2041",
      "author": [
        {
          "@type": "Person",
          "name": "OpenStax"
        }
      ],
      "datePublished": "2016-01-06",
      "copyrightHolder": {
        "@type": "Organization",
        "name": "cnxamgov"
      },
      "license": {
        "@type": "CreativeWork",
        "url": "https://creativecommons.org/licenses/by/4.0/",
        "name": "CC BY (Attribution)"
      }
    },
    "_links": {
      "api": [
        {
          "href": "https://pressbooks.online.ucf.edu/amnatgov/wp-json/"
        }
      ],
      "metadata": [
        {
          "href": "https://pressbooks.online.ucf.edu/amnatgov/wp-json/pressbooks/v2/metadata"
        }
      ],
      "self": [
        {
          "href": "https://pressbooks.online.ucf.edu/wp-json/pressbooks/v2/books/4"
        }
      ]
    }
  }
]

Desired Format:

id, link, context, type, name, inLanguage, image, author_type, author_name, license_type, license_url, license_name

Best Answer

The issue is not really that the JSON that you show is an array, but that each element of the array (of which you only have one) is a fairly complex structure. It is straight forward to extract the relevant data from each array entry into a shorter flat array and convert that into CSV with @csv in jq:

jq -r '.[] | [
        .id,
        .link,
        .metadata."@context",
        .metadata."@type",
        .metadata.name,
        .metadata.inLanguage,
        .metadata.image,
        .metadata.author[0]."@type",
        .metadata.author[0].name,
        .metadata.license."@type",
        .metadata.license.url,
        .metadata.license.name
] | @csv' file.json

... but notice how I'm forced to decide that we're only ever interested in the first author (the .metadata.author sub-structure is an array).

The output:

"4","https://pressbooks.online.ucf.edu/amnatgov/","http://schema.org","Book","American Government","en","https://pressbooks.online.ucf.edu/app/uploads/sites/4/2020/01/American-Government.png","Person","OpenStax","CreativeWork","https://creativecommons.org/licenses/by/4.0/","CC BY (Attribution)"

To create author name strings that are concatenations of all author names (and similarly for author types), with ; as delimiter, you could instead of .metadata.author[0].name in the above use [.metadata.author[].name]|join(";") (and [.metadata.author[]."@type"]|join(";") for the type), so that your command becomes

jq -r '.[] | [
        .id,
        .link,
        .metadata."@context",
        .metadata."@type",
        .metadata.name,
        .metadata.inLanguage,
        .metadata.image,
        ( [ .metadata.author[]."@type" ] | join(";") ),
        ( [ .metadata.author[].name    ] | join(";") ),
        .metadata.license."@type",
        .metadata.license.url,
        .metadata.license.name
] | @csv' file.json
Related Question