How to Sort a Stream of JSON Objects by Field Value Using jq

jqjson

I'm starting with json that looks like this:

{
  "object": "list",
  "data": [
    {
      "id": "in_1HW85aFGUwFHXzvl8wJbW7V7",
      "object": "invoice",
      "account_country": "US",
      "customer_name": "clientOne",
      "date": 1601244686,
      "livemode": true,
      "metadata": {},
      "paid": true,
      "status": "paid",
      "total": 49500
    },
    {
      "id": "in_1HJlIZFGUwFHXzvlWqhegRkf",
      "object": "invoice",
      "account_country": "US",
      "customer_name": "clientTwo",
      "date": 1598297143,
      "livemode": true,
      "metadata": {},
      "paid": true,
      "status": "paid",
      "total": 51000
    },
    {
      "id": "in_1HJkg5FGUwFHXzvlYp2uC63C",
      "object": "invoice",
      "account_country": "US",
      "customer_name": "clientThree",
      "date": 1598294757,
      "livemode": true,
      "metadata": {},
      "paid": true,
      "status": "paid",
      "total": 57000
    },
    {
      "id": "in_1H8B0pFGUwFHXzvlU6nrOm6I",
      "object": "invoice",
      "account_country": "US",
      "customer_name": "clientThree",
      "date": 1595536051,
      "livemode": true,
      "metadata": {},
      "paid": true,
      "status": "paid",
      "total":  20000
    }
  ],
  "has_more": true,
  "url": "/v1/invoices"
}

If I do

cat sample.json | jq -C '.data[] | {invoice_id: .id, date: .date | strftime("%Y-%m-%d"), amount: .total} | .amount = "$" + (.amount/100|tostring)'

I can successfully tidy this up (the actual data is far more verbose, hundreds of lines to eliminate) and so that gives me:

{
  "invoice_id": "in_1HW85aFGUwFHXzvl8wJbW7V7",
  "date": "2020-09-27",
  "amount": "$495"
}
{
  "invoice_id": "in_1HJlIZFGUwFHXzvlWqhegRkf",
  "date": "2020-08-24",
  "amount": "$510"
}
{
  "invoice_id": "in_1HJkg5FGUwFHXzvlYp2uC63C",
  "date": "2020-08-24",
  "amount": "$570"
}
{
  "invoice_id": "in_1H8B0pFGUwFHXzvlU6nrOm6I",
  "date": "2020-07-23",
  "amount": "$200"
}

But that's in the wrong order. I want to sort by the date field, so that the most recent item is displayed last at the bottom.

I've tried every wrong thing imaginable. How do I apply sort_by(.date) to this? I keep getting cannot index string with string "date" errors (and miscellaneous others, but mostly that one).

Best Answer

from man jq

sort, sort_by(path_expression) The sort functions sorts its input, which must be an array.

In general and invoking a separate jq command, you have to use -s, --slurp that will make these sequential objects an array, and then you can sort it by a key.

... | jq -s 'sort_by(.date)'

Now, if you have a selection already and you want that result to be an array, then I guess wrapping it all with brackets will make it:

jq '[ <some_existing_selection> ] | sort_by(.date)' file.json

example

For the json you are starting with, assuming that initially you are doing something like this (producing a sequence of objects):

jq '.data[] | {id: .id, date: .date}' file.json

you have to wrap the whole jq selection in brackets to make it an array:

jq '[.data[] | {id: .id, date: .date}]' file.json

and now this array can be sorted:

jq '[.data[] | {id: .id, date: .date}] | sort_by(.date)' file.json
Related Question