I'm working with a logfile with the following format:
Oct 12 01:28:26 server program: 192.168.1.105 text for 1.105
Oct 12 01:30:00 server program: 192.168.1.104 text for 1.104
Oct 12 01:30:23 server program: 192.168.1.103 text for 1.103
Oct 12 01:32:39 server program: 192.168.1.101 text for 1.101
Oct 12 02:28:26 server program: 192.168.1.105 text for 1.105
Oct 12 02:30:00 server program: 192.168.1.104 text for 1.104
Oct 12 02:30:23 server program: 192.168.1.103 text for 1.103
Oct 12 02:32:39 server program: 192.168.1.101 text for 1.101
I need to achieve this:
Oct 12 02:28:26 server program: 192.168.1.105 text for 1.105
Oct 12 02:30:00 server program: 192.168.1.104 text for 1.104
Oct 12 02:30:23 server program: 192.168.1.103 text for 1.103
Oct 12 02:32:39 server program: 192.168.1.101 text for 1.101
How can I send the new output to a file? I have tried this:
awk '!_[$6]++ {a=$6} END{print a}' logfile
But it does not give me the results expected. How can I use awk or sed to give me only the unique lines with last time the string match was seen or based on date/time?
Best Answer
If you're going to do a second pass (which you pretty well have to), you may as well only store line numbers rather than full records. It makes the logic easier.
Proof of correctness:
At the end of processing each line, every line number processed so far is either a value in
z
, or an index (not value) iny
, but never both.The lines represented by values in
z
are, at the end of each iteration, exactly and only the latest records so far seen for each IP address.The indices of
y
are, therefore, the exact lines which we wish not to print.