Filtering multi-lines from a log

awkfiltergrepperlsed

Should this question be moved to stackoverflow instead?

I often need to read log files generated by java applications using log4j. Usually, a logged message (let's call it a log entry) spans over multiple lines. Example:

INFO  10:57:01.123 [Thread-1] [Logger1] This is a multi-line
text, two lines
DEBUG 10:57:01.234 [Thread-1] [Logger2] This entry takes 3 lines
line 2
line 3

Note that each log entry starts at a new line and the very first word from the line is TRACE, DEBUG, INFO or ERROR and at least one space.
Here, there are 2 log entry, the first at millisecond 123, the other at millisecond 234.

I would like a fast command (using a combination of sed/grep/awk/etc) to filter log entries (grep only filters lines), eg: remove all the log entries containing text 'Logger2'.

I considered doing the following transformations:

1) join lines belonging to the same log entries with a special sequence of chars (eg: ##); this way, all the log entries will take exactly one line

INFO  10:57:01.123 [Thread-1] [Logger1] This is a multi-line##text, two lines
DEBUG 10:57:01.234 [Thread-1] [Logger2] This entry takes 3 lines##line 2##line 3

2) grep
3) split the lines back (ie: replace ## with \n)

I had troubles at step 1 – I do not have enough experience with sed.

Perhaps the 3 steps above are not required, maybe sed can do all the work.

Best Answer

There is no need to mix many instruments. Task can be done by sed only

sed '/^INFO\|^DEBUG\|^TRACE\|^ERROR/{
         /Logger2/{
             :1
             N
             /\nINFO\|\nDEBUG\|\nTRACE\|\nERROR/!s/\n//
             $!t1
             D     }
                                    }' log.entry
Related Question