How to get all lines between first and last occurrences of patterns

sedtext processing

How can I trim a file (well input stream) so that I only get the lines ranging from the first occurrence of pattern foo to the last occurrence of pattern bar?

For instance consider the following input :

A line
like
foo
this 
foo
bar
something
something else
foo
bar
and
the
rest

I expect this output:

foo
this 
foo
bar
something
something else
foo
bar

Best Answer

sed -n '/foo/{:a;N;/^\n/s/^\n//;/bar/{p;s/.*//;};ba};'

The sed pattern matching /first/,/second/ reads lines one by one. When some line matches to /first/ it remembers it and looks forward for the first match for the /second/ pattern. In the same time it applies all activities specified for that pattern. After that process starts again and again up to the end of file.

That's not that we need. We need to look up to the last matching of /second/ pattern. Therefore we build construction that looks just for the first entry /foo/. When found the cycle a starts. We add new line to the match buffer with N and check if it matches to the pattern /bar/. If it does, we just print it and clear the match buffer and janyway jump to the begin of cycle with ba.

Also we need to delete newline symbol after buffer clean up with /^\n/s/^\n//. I'm sure there is much better solution, unfortunately it didn't come to my mind.

Hope everything is clear.

Related Solutions

How to get the last occurrence of lines between two patterns from a file

You can always do:

tac < fileName | sed  '/EndPattern/,$!d;/StartPattern/q' | tac

If your system doesn't have GNU tac, you may be able to use tail -r instead.

You can also do it like:

awk '
  inside {
    text = text $0 RS
    if (/EndPattern/) inside=0
    next
  }
  /StartPattern/ {
    inside = 1
    text = $0 RS
  }
  END {printf "%s", text}' < filename

But that means reading the whole file.

Note that it may give different results if there's another StartPattern in between a StartPattern and the next EndPattern or if the last StartPattern does not have an ending EndPattern or if there are lines matching both StartPattern and EndPattern.

awk '
  /StartPattern/ {
    inside = 1
    text = ""
  }
  inside {text = text $0 RS}
  /EndPattern/ {inside = 0} 
  END {printf "%s", text}' < filename

Would make it behave more like the tac+sed+tac approach (except for the unclosed trailing StartPattern case).

That last one seems to be the closest to your edited requirements. To add the warning would simply be:

awk '
  /StartPattern/ {
    inside = 1
    text = ""
  }
  inside {text = text $0 RS}
  /EndPattern/ {inside = 0} 
  END {
    printf "%s", text
    if (inside)
      print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
  }' < filename

To avoid reading the whole file:

tac < filename | awk '
  /StartPattern/ {
    printf "%s", $0 RS text
    if (!inside)
      print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
    exit
  }
  /EndPattern/ {inside = 1; text = ""}
  {text = $0 RS text}'

Portability note: for /dev/stderr, you need either a system with such a special file (beware that on Linux if stderr is open on a seekable file that will write the text at the beginning of the file instead of the current position within the file) or an awk implementation that emulates it like gawk, mawk or busybox awk (those work around the Linux issue mentioned above).

On other systems, you can replace print ... > "/dev/stderr" with print ... | "cat>&2".

Sed on OS X – extract all text that is between square brackets

awk works well for this too: using [ or ] as the field separator, print every even-numbered field:

awk -F '[][]' '{for (i=2; i<=NF; i+=2) {printf "%s ", $i}; print ""}' file

With sed, I'd write

sed -E 's/(^|\])[^[]*($|\[)/ /g' file

Best Answer

Related Solutions

How to get the last occurrence of lines between two patterns from a file

Sed on OS X – extract all text that is between square brackets

Related Question