Text Processing – Does GNU grep’s -o Option Ignore Zero-Length Matches?

greppcretext processing

I found an answer on another site that was suggesting grep -oP '^\w+|$. I pointed out that the |$ is pointless in PCRE, since it just means "OR end of line" and will therefore always be true for regular lines. However, I can't exactly figure out what it does in GNU grep PCREs when combined with -o. Consider the following:

$ printf 'ab\na\nc\n\n' | perl -ne 'print if /ab|$/'
ab
a
c

$

(I am including the second prompt ($) character to show that the empty line is included in the results).

As expected, in Perl, that will match every line. Either because it contains an ab or because the $ matches the end of the line. GNU grep behaves the same way without the -o flag:

$ printf 'ab\na\nc\n\n' | grep -P 'ab|$'
ab
a
c

$

However, -o changes the behavior:

$ printf 'ab\na\nc\n\n' | grep -oP 'ab|$'
ab
$

This is the same as simply grepping for ab. The second part, the "OR end of line" seems to be ignored. It does work as expected without the -o flag:

What's going on? Does –o ignore 0-length matches? Is that a bug or is it expected?

Best Answer

My GNU grep man page says the following:

-o, --only-matching

Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

emphasis is mine

I'm guessing it considers the end of line match to be an "empty match"

Related Question