I have a program which prints out lines of text ("paragraphs") separated by '–'. For example it might print
--
are you happy
--
I am hungry
are you
--
are(you hungry
too
I want to pipe that into another program (sed maybe?) and get back just the paragraphs that start with a given word (e.g. "are"). So in the above case getting paragraphs that begin with "are" back I'd get
--
are you happy
--
are(you hungry
too
The program prints out a potentially very large number of "paragraphs" but I expect only a small number to match, which is why I would prefer to be able to filter the program's output in a streaming way (avoiding writing everything to a huge file and then filtering it).
Best Answer
AWK
Using GNU awk or mawk:
This sets the variable word to the word to match at the beginning of the record and RS (record separator) to '--' followed by a new line
\n
. Then, for any record which starts with the word to match ($1~"^"word
) print a formatted record. The format is a starting '--' with a new line with the exact record found.GREP
Using (GNU for the
-z
option) grep:Description(s) For the following descriptions, the PCRE option
(?x)
is used to add (a lot) of explaining comments (and spaces) inline with the actual (working) regex. If the comments (and most spaces) (up to the next newline) are removed, the resulting string is still the same regex. This allow the description of the regex in detail in working code. This makes code maintenance a lot easier.Option 1 regex
(?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)
Option 2 regex
(?sx)--\nare.*?(?=\n--|\Z)\n
Option 3 regex
(?xs)--\nare(?:(?!\n--).)*\n
sed