How to parse JSON with shell scripting in Linux

awkjsonsedtext processing

I have a JSON output from which I need to extract a few parameters in Linux.

This is the JSON output:

{
        "OwnerId": "121456789127",
        "ReservationId": "r-48465168",
        "Groups": [],
        "Instances": [
            {
                "Monitoring": {
                    "State": "disabled"
                },
                "PublicDnsName": null,
                "RootDeviceType": "ebs",
                "State": {
                    "Code": 16,
                    "Name": "running"
                },
                "EbsOptimized": false,
                "LaunchTime": "2014-03-19T09:16:56.000Z",
                "PrivateIpAddress": "10.250.171.248",
                "ProductCodes": [
                    {
                        "ProductCodeId": "aacglxeowvn5hy8sznltowyqe",
                        "ProductCodeType": "marketplace"
                    }
                ],
                "VpcId": "vpc-86bab0e4",
                "StateTransitionReason": null,
                "InstanceId": "i-1234576",
                "ImageId": "ami-b7f6c5de",
                "PrivateDnsName": "ip-10-120-134-248.ec2.internal",
                "KeyName": "Test_Virginia",
                "SecurityGroups": [
                    {
                        "GroupName": "Test",
                        "GroupId": "sg-12345b"
                    }
                ],
                "ClientToken": "VYeFw1395220615808",
                "SubnetId": "subnet-12345314",
                "InstanceType": "t1.micro",
                "NetworkInterfaces": [
                    {
                        "Status": "in-use",
                        "SourceDestCheck": true,
                        "VpcId": "vpc-123456e4",
                        "Description": "Primary network interface",
                        "NetworkInterfaceId": "eni-3619f31d",
                        "PrivateIpAddresses": [
                            {
                                "Primary": true,
                                "PrivateIpAddress": "10.120.134.248"
                            }
                        ],
                        "Attachment": {
                            "Status": "attached",
                            "DeviceIndex": 0,
                            "DeleteOnTermination": true,
                            "AttachmentId": "eni-attach-9210dee8",
                            "AttachTime": "2014-03-19T09:16:56.000Z"
                        },
                        "Groups": [
                            {
                                "GroupName": "Test",
                                "GroupId": "sg-123456cb"
                            }
                        ],
                        "SubnetId": "subnet-31236514",
                        "OwnerId": "109030037527",
                        "PrivateIpAddress": "10.120.134.248"
                    }
                ],
                "SourceDestCheck": true,
                "Placement": {
                    "Tenancy": "default",
                    "GroupName": null,
                    "AvailabilityZone": "us-east-1c"
                },
                "Hypervisor": "xen",
                "BlockDeviceMappings": [
                    {
                        "DeviceName": "/dev/sda",
                        "Ebs": {
                            "Status": "attached",
                            "DeleteOnTermination": false,
                            "VolumeId": "vol-37ff097b",
                            "AttachTime": "2014-03-19T09:17:00.000Z"
                        }
                    }
                ],
                "Architecture": "x86_64",
                "KernelId": "aki-88aa75e1",
                "RootDeviceName": "/dev/sda1",
                "VirtualizationType": "paravirtual",
                "Tags": [
                    {
                        "Value": "Server for testing RDS feature in us-east-1c AZ",
                        "Key": "Description"
                    },
                    {
                        "Value": "RDS_Machine (us-east-1c)",
                        "Key": "Name"
                    },
                    {
                        "Value": "1234",
                        "Key": "cost.centre",
                      },
                    {
                        "Value": "Jyoti Bhanot",
                        "Key": "Owner",
                      }
                ],
                "AmiLaunchIndex": 0
            }
        ]
    }

I want to write a file that contains heading like instance id, tag like name, cost center, owner. and below that certain values from the JSON output. The output here given is just an example.

How can I do that using sed and awk?

Expected output :

 Instance id         Name                           cost centre             Owner
    i-1234576          RDS_Machine (us-east-1c)        1234                   Jyoti

Best Answer

The availability of parsers in nearly every programming language is one of the advantages of JSON as a data-interchange format.

Rather than trying to implement a JSON parser, you are likely better off using either a tool built for JSON parsing such as jq or a general purpose script language that has a JSON library.

For example, using jq, you could pull out the ImageID from the first item of the Instances array as follows:

jq '.Instances[0].ImageId' test.json

Alternatively, to get the same information using Ruby's JSON library:

ruby -rjson -e 'j = JSON.parse(File.read("test.json")); puts j["Instances"][0]["ImageId"]'

I won't answer all of your revised questions and comments but the following is hopefully enough to get you started.

Suppose that you had a Ruby script that could read a from STDIN and output the second line in your example output[0]. That script might look something like:

#!/usr/bin/env ruby
require 'json'

data = JSON.parse(ARGF.read)
instance_id = data["Instances"][0]["InstanceId"]
name = data["Instances"][0]["Tags"].find {|t| t["Key"] == "Name" }["Value"]
owner = data["Instances"][0]["Tags"].find {|t| t["Key"] == "Owner" }["Value"]
cost_center = data["Instances"][0]["SubnetId"].split("-")[1][0..3]
puts "#{instance_id}\t#{name}\t#{cost_center}\t#{owner}"

How could you use such a script to accomplish your whole goal? Well, suppose you already had the following:

  • a command to list all your instances
  • a command to get the json above for any instance on your list and output it to STDOU

One way would be to use your shell to combine these tools:

echo -e "Instance id\tName\tcost centre\tOwner"
for instance in $(list-instances); do
    get-json-for-instance $instance | ./ugly-ruby-scriptrb
done

Now, maybe you have a single command that give you one json blob for all instances with more items in that "Instances" array. Well, if that is the case, you'll just need to modify the script a bit to iterate through the array rather than simply using the first item.

In the end, the way to solve this problem, is the way to solve many problems in Unix. Break it down into easier problems. Find or write tools to solve the easier problem. Combine those tools with your shell or other operating system features.

[0] Note that I have no idea where you get cost-center from, so I just made it up.

Related Question