Ubuntu – grepping patterns in a json file

command linegrepjsontext processing

How can I select the lines from my text files similar to this one

"created_at": "Wed Oct 19 12:36:54 +0000 2016"

basically I need to find lines with the pattern

  • starts with Wed Oct 19 and
  • ends with 2016

However, the Wed Oct 19 12:36:54 +0000 2016 could be anywhere in the line and any other time of the day could be in between.

When I use

grep -irn "Wed Oct 19" | grep -irn "2016"

I get all sorts of unwanted results.

Here's an example of a similar line from the file I don't want to match:

"created_at": "Tue Jan 31 18:50:26 +0000 2012",

Thid is part of a tweet's attributes.

Here's a longer part of the input:

 "contributors": null, 
      "retweeted": false, 
      "in_reply_to_user_id_str": null, 
      "place": null, 
      "retweet_count": 4, 
      "created_at": "Sun Apr 03 23:48:36 +0000 2011", 
      "retweeted_status": {
            "text": "In preparation for the NFL lockout, I will be spending twice as much time analyzing my fantasy baseball team during company time. #PGP", 
            "truncated": false, 
            "in_reply_to_user_id": null, 
            "in_reply_to_status_id": null, 

complete example input here:
https://gist.github.com/hrp/900964

UPDATE: I am looking for the file names that contain this pattern in them.

Best Answer

If it could be anywhere in the line, and anything could be in between, I guess

grep -wirn 'Wed Oct 19 .* 2016' *

should get it...

If you only want the filenames, use -l

grep -wirl 'Wed Oct 19 .* 2016' *

Notes

  • -w use word boundaries in case the text you want is stuck onto something else we don't want to match (unlikely in this case)
  • -l just print the filenames of files that contain the match
  • .* any number of any characters here

It's probably OK to parse this file with grep especially for something so simple, but usinga JSON parser as mentioned in David Foerster's answer is the Right Way (i.e. it will likely be more reliable, especially if you need to do anything complex).