I have a file like below:
blablabla
blablabla
***
thingsIwantToRead1
thingsIwantToRead2
thingsIwantToRead3
blablabla
blablabla
I want to extract the paragraph with thingsIwantToRead
. When I had to deal with such a problem, I used AWK like this:
awk 'BEGIN{ FS="Separator above the paragraph"; RS="" } {print $2}' $file.txt | awk 'BEGIN{ FS="separator below the paragraph"; RS="" } {print $1}'
And it worked.
In this case, I tried to put FS="***"
, "\*{3}"
, "\*\*"
(it is not working because AWK treats it like a normal asterisk), "\\*\\*"
or whatever regex I could think of, but it's not working (it's printing nothing).
Do you know why?
If not, do you know another way to deal with my problem?
Below an extract of the file I want to parse:
13.2000000000 , 3*0.00000000000 , 11.6500000000 , 3*0.00000000000 , 17.8800000000
Blablabla
SATELLITE EPHEMERIS
===================
Output frame: Mean of J2000
Epoch A E I RA AofP TA Flight Ang
*****************************************************************************************************************
2012/10/01 00:00:00.000 6998.239 0.001233 97.95558 77.41733 89.98551 290.75808 359.93398
2012/10/01 00:05:00.000 6993.163 0.001168 97.95869 77.41920 124.72698 274.57362 359.93327
2012/10/01 00:10:00.000 6987.347 0.001004 97.96219 77.42327 170.94020 246.92395 359.94706
2012/10/01 00:15:00.000 6983.173 0.000893 97.96468 77.42930 224.76158 211.67042 359.97311
<np>
----------------
Predicted Orbit:
----------------
Blablabla
And I want to extract:
2012/10/01 00:00:00.000 6998.239 0.001233 97.95558 77.41733 89.98551 290.75808 359.93398
2012/10/01 00:05:00.000 6993.163 0.001168 97.95869 77.41920 124.72698 274.57362 359.93327
2012/10/01 00:10:00.000 6987.347 0.001004 97.96219 77.42327 170.94020 246.92395 359.94706
2012/10/01 00:15:00.000 6983.173 0.000893 97.96468 77.42930 224.76158 211.67042 359.97311
And the command I tried to use to get the numbers after the line of *'s:
`awk 'BEGIN{ FS="\\*{2,}"; RS="" } {print $2}' file | awk 'BEGIN{ FS="<np>"; RS="" } {print $1}'`
Best Answer
Tell awk to print between the two delimiters. Specifically:
That will also print the lines containing the delimiters, so you can remove them with:
Alternatively, you can set a variable to true if a line matches the 1st delimiter and to false when it matches the second and only print when it is true:
The command above will set
a
to 1 if the current line matches 4 or more*
and will also skip to thenext
line. This means that the***
line will never be printed.This was in answer to the original, misunderstood, version of the question. I'm leaving it here since it can be useful in a slightly different situation.
First of all, you don't want
FS
(field separator), you wantRS
(record separator). Then, to pass a literal*
, you need to escape it twice. Once to escape the*
and once to escape the backslash (otherwise, awk will try to match it in the same way as\r
or\t
). Then, you print the 2nd "line":To avoid the blank lines around the output, use:
Note that this assumes a
***
after each paragraph, not only after the first one as you show.