text-processing,sed,awk,grep – Print Only Unique Lines That Appear Last in Logfile Based on Date/Time

awkgrepsedtext processing

I'm working with a logfile with the following format:

Oct 12 01:28:26 server program: 192.168.1.105 text for 1.105 
Oct 12 01:30:00 server program: 192.168.1.104 text for 1.104 
Oct 12 01:30:23 server program: 192.168.1.103 text for 1.103
Oct 12 01:32:39 server program: 192.168.1.101 text for 1.101 
Oct 12 02:28:26 server program: 192.168.1.105 text for 1.105 
Oct 12 02:30:00 server program: 192.168.1.104 text for 1.104
Oct 12 02:30:23 server program: 192.168.1.103 text for 1.103 
Oct 12 02:32:39 server program: 192.168.1.101 text for 1.101 

I need to achieve this:

Oct 12 02:28:26 server program: 192.168.1.105 text for 1.105 
Oct 12 02:30:00 server program: 192.168.1.104 text for 1.104
Oct 12 02:30:23 server program: 192.168.1.103 text for 1.103
Oct 12 02:32:39 server program: 192.168.1.101 text for 1.101

How can I send the new output to a file? I have tried this:

awk '!_[$6]++ {a=$6} END{print a}' logfile

But it does not give me the results expected. How can I use awk or sed to give me only the unique lines with last time the string match was seen or based on date/time?

Best Answer

If you're going to do a second pass (which you pretty well have to), you may as well only store line numbers rather than full records. It makes the logic easier.

awk 'NR == FNR {if (z[$6]) y[z[$6]]; z[$6] = FNR; next} !(FNR in y)' logfile logfile

Proof of correctness:

At the end of processing each line, every line number processed so far is either a value in z, or an index (not value) in y, but never both.

The lines represented by values in z are, at the end of each iteration, exactly and only the latest records so far seen for each IP address.

The indices of y are, therefore, the exact lines which we wish not to print.

Related Question