Getting the last lines that matches a pattern in multiple files

awkcutgrepsorttext processing

I have an application that outputs a set of log files to a central directory like this:

/tmp/experiment/log/    
├── node01.log
├── node02.log
├── node03.log
├── node04.log
├── node05.log
├── node06.log

Inside each file, different measures are taken during the lifetime of each log's process, so the lines look like this:

prop1=5, ts=X, node01
prop2=3, ts=X, node01
prop1=7, ts=Y, node01
...

I'm struggling to write some commands that can process all files and output the LAST reading of a giving property, ideally outputting something like this:

node01, prop1=7, ts=...
node02, prop1=9, ts=...
node03, prop1=3, ts=...

Any suggestions? I started using a combination of grep, cut, sort, uniq like this:

$ grep -sirh "prop1" /tmp/experiment/log/ | \
   cut --delimiter=, --fields=1,4 | uniq | sort | \
   tail -n 14`  --this example had 14 log files

but it only worked partially as in some experiments it would end up printing multiple records of the same log and exclude some other logs.

I moved on to awk with this:

$ awk -F":" '/prop1/ { print $NF $2}' /tmp/experiment/log/node*.log | \
   awk 'END { print }'

and had the problem that when I pass multiple input files, it only gives me the last line of the last log file instead of 1 output line per log file.

Any suggestions on how to accomplish this?

Best Answer

Take a look at ENDFILE blocks (GNU awk specific). You could run something along the lines of

awk     'BEGINFILE { a = ""}
         /prop1/   { a=$NF $2 $1}    ## Change this if necessary
         ENDFILE   { if (a != "") print FILENAME, a}' ./node*.log
Related Question