Grep Sed Awk – Extract Text Until First Blank Line

awkregular expressionsedsort

I have a file prova.txt like this:

Start to grab from here: 1
fix1
fix2
fix3
fix4
random1
random2
random3
random4

extra1
extra2
bla

Start to grab from here: 2
fix1
fix2
fix3
fix4
random1546
random2561

extra2
bla
bla

Start to grab from here: 1
fix1
fix2
fix3
fix4
random1
random22131

and I need to grep out from "Start to grab here" to the first blank line. The output should be like this:

Start to grab from here: 1
fix1
fix2
fix3
fix4
random1
random2
random3
random4

Start to grab from here: 2
fix1
fix2
fix3
fix4
random1546
random2561

Start to grab from here: 1
fix1
fix2
fix3
fix4
random1
random22131

As you can see the lines after "Start to grab here" are random, so -A -B grep flag don't work:

cat prova.txt | grep "Start to grab from here" -A 15 | grep -B 15 "^$" > output.txt

Can you help me to find a way that catch the first line that will be grabbed (as "Start to grab from here"), until a blank line. I cannot predict how many random lines I will have after "Start to grab from here".

Any unix compatible solution is appreciate (grep, sed, awk is better than perl or similar).

EDITED: after brilliant response by @john1024, I would like to know if it's possible to:

1° sort the block (according to Start to grab from here: 1 then 1 then 2)

2° remove 4 (alphabetically random) lines fix1,fix2,fix3,fix4 but are always 4

3° eventually remove random dupes, like sort -u command

Final output shoul be like this:

# fix lines removed - match 1 first time
Start to grab from here: 1
random1
random2
random3
random4

#fix lines removed - match 1 second time
Start to grab from here: 1
#random1 removed cause is a dupe
random22131

#fix lines removed - match 2 that comes after 1
Start to grab from here: 2
random1546
random2561

or

# fix lines removed - match 1 first time and the second too
Start to grab from here: 1
random1
random2
random3
random4
#random1 removed cause is a dupe
random22131

#fix lines removed - match 2 that comes after 1
Start to grab from here: 2
random1546
random2561

The second output is better that the first one. Some other unix command magic is needed.

Best Answer

Using awk

Try:

$ awk '/Start to grab/,/^$/' prova.txt
Start to grab from here: 1
random1
random2
random3
random4

Start to grab from here: 2
random1546
random2561

Start to grab from here: 3
random45
random22131

/Start to grab/,/^$/ defines a range. It starts with any line that matches Start to grab and ends with the first empty line, ^$, that follows.

Using sed

With very similar logic:

$ sed -n '/Start to grab/,/^$/p' prova.txt
Start to grab from here: 1
random1
random2
random3
random4

Start to grab from here: 2
random1546
random2561

Start to grab from here: 3
random45
random22131

-n tells sed not to print anything unless we explicitly ask it to. /Start to grab/,/^$/p tells it to print any lines in the range defined by /Start to grab/,/^$/.

Related Question