Getting the last lines that matches a pattern in multiple files

awkcutgrepsorttext processing

I have an application that outputs a set of log files to a central directory like this:

/tmp/experiment/log/    
├── node01.log
├── node02.log
├── node03.log
├── node04.log
├── node05.log
├── node06.log

Inside each file, different measures are taken during the lifetime of each log's process, so the lines look like this:

prop1=5, ts=X, node01
prop2=3, ts=X, node01
prop1=7, ts=Y, node01
...

I'm struggling to write some commands that can process all files and output the LAST reading of a giving property, ideally outputting something like this:

node01, prop1=7, ts=...
node02, prop1=9, ts=...
node03, prop1=3, ts=...

Any suggestions? I started using a combination of grep, cut, sort, uniq like this:

$ grep -sirh "prop1" /tmp/experiment/log/ | \
   cut --delimiter=, --fields=1,4 | uniq | sort | \
   tail -n 14`  --this example had 14 log files

but it only worked partially as in some experiments it would end up printing multiple records of the same log and exclude some other logs.

I moved on to awk with this:

$ awk -F":" '/prop1/ { print $NF $2}' /tmp/experiment/log/node*.log | \
   awk 'END { print }'

and had the problem that when I pass multiple input files, it only gives me the last line of the last log file instead of 1 output line per log file.

Any suggestions on how to accomplish this?

Best Answer

Take a look at ENDFILE blocks (GNU awk specific). You could run something along the lines of

awk     'BEGINFILE { a = ""}
         /prop1/   { a=$NF $2 $1}    ## Change this if necessary
         ENDFILE   { if (a != "") print FILENAME, a}' ./node*.log

Related Solutions

How to Get Last Occurrence of Lines Between Two Patterns

You can always do:

tac < fileName | sed  '/EndPattern/,$!d;/StartPattern/q' | tac

If your system doesn't have GNU tac, you may be able to use tail -r instead.

You can also do it like:

awk '
  inside {
    text = text $0 RS
    if (/EndPattern/) inside=0
    next
  }
  /StartPattern/ {
    inside = 1
    text = $0 RS
  }
  END {printf "%s", text}' < filename

But that means reading the whole file.

Note that it may give different results if there's another StartPattern in between a StartPattern and the next EndPattern or if the last StartPattern does not have an ending EndPattern or if there are lines matching both StartPattern and EndPattern.

awk '
  /StartPattern/ {
    inside = 1
    text = ""
  }
  inside {text = text $0 RS}
  /EndPattern/ {inside = 0} 
  END {printf "%s", text}' < filename

Would make it behave more like the tac+sed+tac approach (except for the unclosed trailing StartPattern case).

That last one seems to be the closest to your edited requirements. To add the warning would simply be:

awk '
  /StartPattern/ {
    inside = 1
    text = ""
  }
  inside {text = text $0 RS}
  /EndPattern/ {inside = 0} 
  END {
    printf "%s", text
    if (inside)
      print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
  }' < filename

To avoid reading the whole file:

tac < filename | awk '
  /StartPattern/ {
    printf "%s", $0 RS text
    if (!inside)
      print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
    exit
  }
  /EndPattern/ {inside = 1; text = ""}
  {text = $0 RS text}'

Portability note: for /dev/stderr, you need either a system with such a special file (beware that on Linux if stderr is open on a seekable file that will write the text at the beginning of the file instead of the current position within the file) or an awk implementation that emulates it like gawk, mawk or busybox awk (those work around the Linux issue mentioned above).

On other systems, you can replace print ... > "/dev/stderr" with print ... | "cat>&2".

Grep only for lines of process that caused the error

You can do it with:

grep "error_has_happened" -A3 logfile.log

Where 3, is the number of lines after error_has_happened appearence that will be shown. However, this will also show other process outputs not only the outputs of the process which sends the error.

A more elaborated command that worked for me in a quick test is:

grep "error_has_happened" logfile.log | cut -d : -f1 | sort -u |
  while IFS= read -r process; do
    grep "^$process:" logfile.log |
      grep -A3 "error_has_happened"
  done

Best Answer

Related Solutions

How to Get Last Occurrence of Lines Between Two Patterns

Grep only for lines of process that caused the error

Related Question