Ubuntu – How to capture lines between two strings from a file, but only the last occurrence

command linetext processing

I have a log file which is output by a script, the log file is rotated daily. It will contain the strings

Transfer started at timestamp 

and

Transfer completed successfully at timestamp

repeatedly, as the mentioned transfer will take place hourly. The timestamps will have been previously created with date.

  • I want to capture the last instance of these two strings, and
    everything in between, into a separate file.
  • If the started string is found near the end of the log file, with no
    following completed string, I want to capture everything up to EOF
    and output an error message to say that the end string was not found.

I'm guessing I'll need to use sed or awk but am really inexperienced with them. I want to use the command in a bash script, and understand what each part is doing, so some explanation would be very useful.

An example chunk of log file:

ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer completed successfully at Fri May 27 14:05:16 BST 2016
--------------------------------------------------------------------
Local repository verification started at Fri May 27 14:35:02 BST 2016
...

The desired output:

Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
37 approvals pending.
Transfer completed successfully at Fri May 27 14:05:16 BST 2016

However, if the log file was like this:

ERROR - Second tech sync failed with rsync error code 255 at Fri May 27 13:50:4$
--------------------------------------------------------------------
After_sync script completed successfully with no errors.
Main script finished at Fri May 27 13:50:43 BST 2016 with PID of 18808.
--------------------------------------------------------------------
Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.

I would want to output:

Transfer started at Fri May 27 13:50:45 BST 2016
Logs transferred successfully.
Images transferred successfully.
Hashes transferred successfully.
ERROR: transfer not complete by end of log file

Best Answer

When I hear "I want to do X with the last something in the file", I think:

  • reverse the file
  • do X with the first something in the file
  • reverse the output of X

In code:

tac logfile | awk '
    BEGIN {text = "ERROR: transfer not complete by end of log file"}
    /^Transfer completed successfully/ {text = ""}
    {text = text ORS $0}
    /^Transfer started at / {print text; exit}
' | tac

Since we are reading the log file from the bottom up, I start with assuming the transfer is not completed. If I see the "transfer completed" message, we can throw out whatever we've captured so far. We save each line. When we see the "transfer started" line, we know we have seen all of the last transfer in the file: print out the (reversed) captured text and exit awk.

Related Question