Text Processing – Can Grep Output Only Specified Groupings That Match

grepregular expressiontext processing

Say I have a file:

# file: 'test.txt'
foobar bash 1
bash
foobar happy
foobar

I only want to know what words appear after "foobar", so I can use this regex:

"foobar \(\w\+\)"

The parenthesis indicate that I have a special interest in the word right after foobar. But when I do a grep "foobar \(\w\+\)" test.txt, I get the entire lines that match the entire regex, rather than just "the word after foobar":

foobar bash 1
foobar happy

I would much prefer that the output of that command looked like this:

bash
happy

Is there a way to tell grep to only output the items that match the grouping (or a specific grouping) in a regular expression?

Best Answer

GNU grep has the -P option for perl-style regexes, and the -o option to print only what matches the pattern. These can be combined using look-around assertions (described under Extended Patterns in the perlre manpage) to remove part of the grep pattern from what is determined to have matched for the purposes of -o.

$ grep -oP 'foobar \K\w+' test.txt
bash
happy
$

The \K is the short-form (and more efficient form) of (?<=pattern) which you use as a zero-width look-behind assertion before the text you want to output. (?=pattern) can be used as a zero-width look-ahead assertion after the text you want to output.

For instance, if you wanted to match the word between foo and bar, you could use:

$ grep -oP 'foo \K\w+(?= bar)' test.txt

or (for symmetry)

$ grep -oP '(?<=foo )\w+(?= bar)' test.txt
Related Question