Grep log and get text between log delimiters

grepsubversion

Is there a way to grep a log and find text between the log entry delimeters?
Our log file separates the line entry with characters "-------"
So when I search the text word I want all the lines before and after the delimeters in the log.

Sample log

------------------------------------------------------------------------

    r132279 | USERID | 2014-04-30 12:59:09 -0700 (Wed, 30 Apr 2014) | 3 lines
    Removed unused "Calculated Fields" column entry.
    Jira ID: JIRA-977

------------------------------------------------------------------------

In the above i would Grep the word Fields but want all the lines between the "----" lines

Best Answer

If you know how big the record is, then you can output additional lines of context before (-B) and after (-A) the matching line e.g.

grep -A2 -B2 'Fields' sample.log

or for context both before and after the match line

grep -C3 'Fields' sample.log

As far as I know, the only way to do a true multiline match (rather than a single line match plus context) in GNU grep is by using the PCRE regex mode (-P) with the -z flag to prevent breaking on newlines. For example, you could try

grep -zPo '(\n-+\n)\K(.|\n)+?Fields(.|\n)+?(?=\n-+\n)'

which does a non-greedy match of the string Fields surrounded by any characters OR newlines, provided it is bookended by the newline-hyphens-newline delimiters. An equivalent expression in pcregrep is

pcregrep -Mo '(\n-+\n)\K(.|\n)+?Fields(.|\n)+?(?=\n-+\n)'

Another option for this kind of record-structured data is awk: in particular, GNU awk allows a regular expression to be used for the internal record separator RS e.g.

$ gawk -vRS='\n-+\n' '/Fields/ {print}' sample.log

r132279 | USERID | 2014-04-30 12:59:09 -0700 (Wed, 30 Apr 2014) | 3 lines

Removed unused "Calculated Fields" column entry.

Jira ID: JIRA-977

Related Solutions

How to remove text matching specific patterns from a file

If your timestamps are consistently formated, you could strip them off (with sed, for example) before processing the files with whatever differencing method, e.g.

diff <(sed -E 's|[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{2,4} [0-9]{1,} ||' fileA) <(sed -E 's|[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{2,4} [0-9]{1,} ||' fileB)

Testing on your supplied input files:

$ diff \
<(sed -E 's|[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{2,4} [0-9]{1,} ||' fileA) \
<(sed -E 's|[0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{2}/[0-9]{2}/[0-9]{2,4} [0-9]{1,} ||' fileB)
2,3c2,3
< abc xxx
< ghi eee ddd
---
> abc def
> ghi fff ddd

Logs – Filtering Multi-lines from a Log

There is no need to mix many instruments. Task can be done by sed only

sed '/^INFO\|^DEBUG\|^TRACE\|^ERROR/{
         /Logger2/{
             :1
             N
             /\nINFO\|\nDEBUG\|\nTRACE\|\nERROR/!s/\n//
             $!t1
             D     }
                                    }' log.entry

Best Answer

Related Solutions

How to remove text matching specific patterns from a file

Logs – Filtering Multi-lines from a Log

Related Question