How to get the last occurrence of lines between two patterns from a file

awkgawksedtext processing

I have a log file which reports on the output of a process, I'd like to extract all lines from between the last occurrence of two patterns.

The patterns will be along the lines of;

Summary process started at <datestring>

and

Summary process finished at <datestring> with return code <num>

There will be several instances of these patterns throughout the file, along with a lot of other information. I'd like to print the only the last occurrence.

I know that I can use:

sed -n '/StartPattern/,/EndPattern/p' FileName

To get lines between the patterns, but not sure how to get the last instance.
Sed or awk solutions would be fine.

Edit:
I've not been clear at all about the behaviour that I want when multiple StartPatterns appear with no EndPattern, or if there's no EndPattern before the end of file, after detecting a StartPattern

For multiple StartPatterns with missing EndPattern, I'd only like lines from the last StartPattern to the EndPattern.

For a StartPattern which reaches the EOF without an EndPattern, I'd like everything up to the EOF, followed by inputting a string to warn that EOF was reached.

Best Answer

You can always do:

tac < fileName | sed  '/EndPattern/,$!d;/StartPattern/q' | tac

If your system doesn't have GNU tac, you may be able to use tail -r instead.

You can also do it like:

awk '
  inside {
    text = text $0 RS
    if (/EndPattern/) inside=0
    next
  }
  /StartPattern/ {
    inside = 1
    text = $0 RS
  }
  END {printf "%s", text}' < filename

But that means reading the whole file.

Note that it may give different results if there's another StartPattern in between a StartPattern and the next EndPattern or if the last StartPattern does not have an ending EndPattern or if there are lines matching both StartPattern and EndPattern.

awk '
  /StartPattern/ {
    inside = 1
    text = ""
  }
  inside {text = text $0 RS}
  /EndPattern/ {inside = 0} 
  END {printf "%s", text}' < filename

Would make it behave more like the tac+sed+tac approach (except for the unclosed trailing StartPattern case).

That last one seems to be the closest to your edited requirements. To add the warning would simply be:

awk '
  /StartPattern/ {
    inside = 1
    text = ""
  }
  inside {text = text $0 RS}
  /EndPattern/ {inside = 0} 
  END {
    printf "%s", text
    if (inside)
      print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
  }' < filename

To avoid reading the whole file:

tac < filename | awk '
  /StartPattern/ {
    printf "%s", $0 RS text
    if (!inside)
      print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
    exit
  }
  /EndPattern/ {inside = 1; text = ""}
  {text = $0 RS text}'

Portability note: for /dev/stderr, you need either a system with such a special file (beware that on Linux if stderr is open on a seekable file that will write the text at the beginning of the file instead of the current position within the file) or an awk implementation that emulates it like gawk, mawk or busybox awk (those work around the Linux issue mentioned above).

On other systems, you can replace print ... > "/dev/stderr" with print ... | "cat>&2".

Best Answer

Related Solutions

Find 2nd Occurrence of string from the end of file

Related Question