Grep/Sed/Awk – Print Line Only if Next Line Does Not Match

awkgrepsed

I am trying to search a log file for logged activities that did not complete. For example, I log a "Starting activity for ID 1234…" and if successful, the next line will be "Activity 1234 Completed."

I'm trying to get the "Starting…" lines that are NOT followed by their corresponding "Completed" lines.

Example Log File

Starting activity for ID 1234
ID 1234 completed successfully
Starting activity for ID 3423
ID 3423 completed successfully
Starting activity for ID 9876
ID 9876 completed successfully
Starting activity for ID 99889
ID 99889 completed successfully
Starting activity for ID 10011
ID 10011 completed successfully
Starting activity for ID 33367
Starting activity for ID 936819
ID 936819 completed successfully

In this example, I would be looking for the output to be:

Starting activity for ID 33367

…because it's not followed by a "completed" line.

I've tried doing this with grep and awk, but have not had much success. I'm assuming it can be done with one of those tools, but my grep and awk chops are not advanced.

Looking for a quick and reliable grep or awk pattern to give the results I need here.

Best Answer

Here is an awk alternative:

awk '
  /^Starting/ { I[$5] = $0                  }
  /^ID/       { delete I[$2]                }
  END         { for (key in I) print I[key] }
' infile

Output:

Starting activity for ID 33367

The I associative array keeps track of what ids have been seen.

Related Question