How to extract logs between two time stamps

awkgrepsedtext processing

I want to extract all logs between two timestamps. Some lines may not have the timestamp, but I want those lines also. In short, I want every line that falls under two time stamps. My log structure looks like:

[2014-04-07 23:59:58] CheckForCallAction [ERROR] Exception caught in +CheckForCallAction :: null
--Checking user--
Post
[2014-04-08 00:00:03] MobileAppRequestFilter [DEBUG] Action requested checkforcall

Suppose I want to extract everything between 2014-04-07 23:00 and 2014-04-08 02:00.

Please note the start time stamp or end time stamp may not be there in the log, but I want every line between these two time stamps.

Best Answer

You can use awk for this:

$ awk -F'[]]|[[]' \
  '$0 ~ /^\[/ && $2 >= "2014-04-07 23:00" { p=1 }
   $0 ~ /^\[/ && $2 >= "2014-04-08 02:00" { p=0 }
                                        p { print $0 }' log

Where:

  • -F specifies the characters [ and ] as field separators using a regular expression
  • $0 references a complete line
  • $2 references the date field
  • p is used as boolean variable that guards the actual printing
  • $0 ~ /regex/ is true if regex matches $0
  • >= is used for lexicographically comparing string (equivalent to e.g. strcmp())

Variations

The above command line implements right-open time interval matching. To get closed interval semantics just increment your right date, e.g.:

$ awk -F'[]]|[[]' \
  '$0 ~ /^\[/ && $2 >= "2014-04-07 23:00"    { p=1 }
   $0 ~ /^\[/ && $2 >= "2014-04-08 02:00:01" { p=0 }
                                           p { print $0 }' log

In case you want to match timestamps in another format you have to modify the $0 ~ /^\[/ sub-expression. Note that it used to ignore lines without any timestamps from print on/off logic.

For example for a timestamp format like YYYY-MM-DD HH24:MI:SS (without [] braces) you could modify the command like this:

$ awk \
  '$0 ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9]/
      {
        if ($1" "$2 >= "2014-04-07 23:00")     p=1;
        if ($1" "$2 >= "2014-04-08 02:00:01")  p=0;
      }
    p { print $0 }' log

(note that also the field separator is changed - to blank/non-blank transition, the default)

Related Question