Text Processing – Get 2 lines with exact text between them

awksedtext processing

I have file with unknown number of blocks of text consisting of starting keyword "Start", ending keyword "End" and optional text between them with one exact keyword "Disk" on every line and I need to get rid of the ones where there is nothing between them, see the example.

I am processing input like this:

Server1:Start
Server1:End
Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

, and my desired output is this:

Server2:Start
Disk1
Disk2
Server2:End
Server3:Start
Disk1
Server3:End

I know, that I can use 'awk' or 'sed' to find text between 2 lines, but I do not know what to do, if there are multiple occurrences of these 2 lines or if there is no text between these 2 lines.

I am running Ubuntu 17.10.

Looking forward to any help.

edit: I deleted the post first time, because I thought that I can do it using sed -e '/Start/,/End/d', but this actually removes everything.

Best Answer

To delete back-to-back Start and End lines, this should do in GNU sed:

$ sed -e '/Start/ {N; /^\(.*\):Start\n\1:End$/d }' < input

if we see Start, load the next line with N, then see if the contents of the buffer are just Somename:Start\nSomename:End with Somename same on both lines (\n is a newline). If so, delete it. Here, \1 is a reference to the first group within \(..\), and matches the same string that was encountered there. .* just means any number (*) of any characters (.).

Using sed -e '/Start/,/End/d' would indeed delete every single line, since the range matches all lines between the starting and ending patterns. Everything in the input is between Start and End, so everything is deleted.

Related Solutions

Grepping for a block of text with parts that can be optional

This would do it i hope. Events go to events file. And messages go to stdout.

Save this file to myprogram.awk (for example):

#!/usr/bin/awk -f

BEGIN {
   s=0;  ### state. Active when parsing inside an event
   nevent=0;  ### Current event number
   printf "" > "events"
}

# Start of event
/^ *Data control raising event/ {
   s=1;
   dentries=0;
   print "*** Event number: " nevent >> "events"
   nevent++
}

# Standard event line
s==1 {
   print >> "events"
}

# DataChangeEntry line
/^ *==== DataChangeEntry/ {
   dentries ++
}

# End of event
s==1 && /^ *\]\]/ {
   s=0;
   print "" >> "events"
   if(dentries==0){
      print "Warning: Event " nevent " has no Data Entries"
   }
}

END {
   print "Total event count: " nevent
}

You can invoke it in different ways:

myprogram.awk inputfile.txt
awk -f myprogram.awk inputfile.txt

Sample output:

Warning: Event 3 has no Data Entries
Total event count: 3

You can check all the events together in the file called events in working directory.

How to get the last occurrence of lines between two patterns from a file

You can always do:

tac < fileName | sed  '/EndPattern/,$!d;/StartPattern/q' | tac

If your system doesn't have GNU tac, you may be able to use tail -r instead.

You can also do it like:

awk '
  inside {
    text = text $0 RS
    if (/EndPattern/) inside=0
    next
  }
  /StartPattern/ {
    inside = 1
    text = $0 RS
  }
  END {printf "%s", text}' < filename

But that means reading the whole file.

Note that it may give different results if there's another StartPattern in between a StartPattern and the next EndPattern or if the last StartPattern does not have an ending EndPattern or if there are lines matching both StartPattern and EndPattern.

awk '
  /StartPattern/ {
    inside = 1
    text = ""
  }
  inside {text = text $0 RS}
  /EndPattern/ {inside = 0} 
  END {printf "%s", text}' < filename

Would make it behave more like the tac+sed+tac approach (except for the unclosed trailing StartPattern case).

That last one seems to be the closest to your edited requirements. To add the warning would simply be:

awk '
  /StartPattern/ {
    inside = 1
    text = ""
  }
  inside {text = text $0 RS}
  /EndPattern/ {inside = 0} 
  END {
    printf "%s", text
    if (inside)
      print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
  }' < filename

To avoid reading the whole file:

tac < filename | awk '
  /StartPattern/ {
    printf "%s", $0 RS text
    if (!inside)
      print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
    exit
  }
  /EndPattern/ {inside = 1; text = ""}
  {text = $0 RS text}'

Portability note: for /dev/stderr, you need either a system with such a special file (beware that on Linux if stderr is open on a seekable file that will write the text at the beginning of the file instead of the current position within the file) or an awk implementation that emulates it like gawk, mawk or busybox awk (those work around the Linux issue mentioned above).

On other systems, you can replace print ... > "/dev/stderr" with print ... | "cat>&2".

Best Answer

Related Solutions

Grepping for a block of text with parts that can be optional

How to get the last occurrence of lines between two patterns from a file

Related Question