Grep regular expression solution (greedy not working)

command linegrepregular expressiontext processing

I have the following text in the data.txt file

:MENU1
0. public
1. admin
2. webmail

:SYNTAX
! opt1, ... :

:ERROR1
Error #1, blah... blah.. blah...
Please do ...

:ERROR2
Error #2 ...

and I want to use a regular expression (PERL syntax) to extract the part from :MENU1 to the next first :, but dropping MENU1 and the last : from the result.

Been trying several regex's but in the closest solution I got
I can't put the 'greedy' option to work and cant't discard the last ":"

grep -Poz "^:MENU1\K[\w\W]*:"

this works with grep …
but brings all the text until the last ":" …
I want only until the next first ":" after :MENU1:

0. public
1. admin
2. webmail
 

(note the final blank line)

Best Answer

The pattern *: will match everything until the last :. To stop at the next : you need *?:. E.g.:

% grep -Poz '^:MENU1\K[\w\W]*?:' data.txt 

0. public
1. admin
2. webmail

:

You can strip the first line by matching the newline before your \K. E.g.:

% grep -Poz '^:MENU1\n\K[\w\W]*?:' data.txt 
0. public
1. admin
2. webmail

:

To eat the empty line and the : you can match and discard that text. E.g.:

% grep -Poz '^:MENU1\n\K[\w\W]*?(?=\n+:)' data.txt 
0. public
1. admin
2. webmail

next we can simplify your character class, to match on anything but ::

% grep -Poz '^:MENU1\n\K[^:]*?(?=\n+:)' data.txt 
0. public
1. admin
2. webmail

And finally we can rewrite the initial part of the match:

% grep -Poz '(?<=:MENU1\n)[^:]*?(?=\n+:)' data.txt 
0. public
1. admin
2. webmail

This is similar to what @terdon came up with, but this takes care of the blank lines without another call to grep.

This final regex makes use of look-around assertions. The (?<=pattern) is a look-behind assertion that lets you match the pattern but not include it as part of the output. The (?=pattern) is a look-ahead assertion and lets us match on the trailing pattern without including it in the output.

Related Question