Python JSON Parsing – How to Parse JSON Using Python

bashjsonpython

I have a JSON file members.json as below.

{
   "took": 670,
   "timed_out": false,
   "_shards": {
      "total": 8,
      "successful": 8,
      "failed": 0
   },
   "hits": {
      "total": 74,
      "max_score": 1,
      "hits": [
         {
            "_index": "2000_270_0",
            "_type": "Medical",
            "_id": "02:17447847049147026174478:174159",
            "_score": 1,
            "_source": {
               "memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
               "memberFirstName": "Uri",
               "memberMiddleName": "Prayag",
               "memberLastName": "Dubofsky"
            }
         }, 
         {
            "_index": "2000_270_0",
            "_type": "Medical",
            "_id": "02:17447847049147026174478:174159",
            "_score": 1,
            "_source": {
               "memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
               "memberFirstName": "Uri",
               "memberMiddleName": "Prayag",
               "memberLastName": "Dubofsky"
            }
         }
      ]
   }
}

I want to parse it using bash script get only the list of field memberId.

The expected output is:

memberIds
----------- 
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG

I tried adding following bash+python code to .bashrc:

function getJsonVal() {
   if [ \( $# -ne 1 \) -o \( -t 0 \) ]; then
       echo "Usage: getJsonVal 'key' < /tmp/file";
       echo "   -- or -- ";
       echo " cat /tmp/input | getJsonVal 'key'";
       return;
   fi;
   cat | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["'$1'"]';
}

And then called:

$ cat members.json | getJsonVal "memberId"

But it throws:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyError: 'memberId'

Reference

https://stackoverflow.com/a/21595107/432903

Best Answer

If you would use:

 $ cat members.json | \
     python -c 'import json,sys;obj=json.load(sys.stdin);print obj;'

you can inspect the structure of the nested dictonary obj and see that your original line should read:

$ cat members.json | \
    python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hits"]["hits"][0]["_source"]["'$1'"]';

to the to that "memberId" element. This way you can keep the Python as a oneliner.

If there are multiple elements in the nested "hits" element, then you can do something like:

$ cat members.json | \
python -c '
import json, sys
obj=json.load(sys.stdin)
for y in [x["_source"]["'$1'"] for x in obj["hits"]["hits"]]:
    print y
'

Chris Down's solution is better for finding a single value to (unique) keys at any level.

With my second example that prints out multiple values, you are hitting the limits of what you should try with a one liner, at that point I see little reason why to do half of the processing in bash, and would move to a complete Python solution.

Related Question