grep
's name comes after the g/re/p
ed
command. Its primary purpose is to print the lines that match a regexp. It's not its role to edit the content of those lines. You have sed
(the stream editor) or awk
for that.
Now, some grep
implementations, starting with GNU grep
added a -o
option to print the matched portion of each line (what is matched by the regexp, not its capture groups). You've got some grep
implementation like GNU's again (with -P
) or pcregrep
that support PCREs for their regexps.
pcregrep
actually added a -o<n>
option to print the content of a capture group. So you could do:
pcregrep -o1 -o2 --om-separator=' ' '.zoo.(\d+).*:\s+(.*)'
But here, the obvious standard solution is to use sed
:
sed -n 's/^.*\.zoo\.\([0-9]\{1,\}\).*:[[:space:]]\{1,\}/\1 /p'
Or if you want perl regexps, use perl:
perl -lne 'print "$1 $2" if /\.zoo\.(\d+).*:\s+(.*)/'
With GNU grep
, if you don't mind the matches to appear on different lines, you can do:
$ grep -Po '\.zoo\.\K\d+|:\s+\K.*' < file
2
0.45654343
Note that while \K
resets the start of the matched portion, that doesn't mean you can get away with the two parts of the alternation overlapping.
grep -Po '.zoo.(\K\d+|.: \K.)'
would not work, just like echo foobar | grep -Po 'foo|foob'
wouldn't work (at printing both foo
and foob
). foo|foob
first matches foo
and then grep
looks for potential other matches in the input after the foo
, so starting at the b
of bar
, so can't find any more after that.
Above with grep -Po '\.zoo\.\K\d+|:\s+\K.*'
, we only look for :<spaces><anything>
in the second part of the alternation. That does match in the part that is after .zoo.<digits>
but that also means it would find those :<spaces><anything>
anywhere in the input, not only when they follow .zoo.<digits>
.
There is a way to work around that though, using another PCRE special operator: \G
. \G
matches at the start of the subject. For a single match, that's equivalent to ^
, but with multiple matches (think of sed
/perl
's g
flag in s/.../.../g
) like with -o
where grep
tries to find all the matches in the line, that also matches after the end of the previous match. So if you make it:
grep -Po '\.zoo\.\K\d+|(?!^)\G.*:\s+\K.*'
Where (?!^)
is a negative look-ahead operator that means not at the beginning of the line, that \G
will only match after a previous successful (non-empty) match, so .*:\s+\K.*
will only match if it follows a previous successful match, and that can only be the .foo.<digits>
one since the other part of the alternation matches til the end of the line.
On an input like:
.zoo.1.zoo.2 tar: blah
That would output:
1
2
blah
Though. If you did not want that, you'd also want the first part of the alternation to only match at the beginning of the line. Something like
grep -Po '^.*?\.zoo\.\K\d+|(?!^)\G.*:\s+\K.*'
That still outputs 2
on an input like .zoo.2 no colon character
or .zoo.2 blah:
. Which you could work around with a look-ahead operator in the first part of the alternation, and look for at least one non-space after :<spaces>
(and also using $
to avoid issues with non-characters)
grep -Po '^.*?\.zoo\.\K\d+(?=.*:\s+\S.*$)|(?!^)\G.*:\s+\K\S.*$'
You'd probably need a few pages of comments to explain that regexp, so I would still go for the straightfoward sed
/perl
solutions...
No, -H
and -o
are not mutually exclusive. You may have a Carriage Return character in the part that is matched. This would make the following text be written at the start of the line, thus overwriting the file name.
$ printf 'foobar\n' | grep -Ho '.bar'
(standard input):obar
$ printf 'foo\rbar\n' | grep -Ho '.bar'
bar
Also, since both lines belong to the same file baz.txt
(if your example is correct), this may also be due to a long line (larger than the screen width) being wrapped to the next line.
$ printf 'foo%80sbar\n' | grep -Ho 'foo.*'
(standard input):foo
bar
Whether one of those scenarios may apply to your situation really depends on your search regex and the content of your files.
Best Answer
With normal regex, the characters
(
,|
and)
need to be escaped. So you should useYou don't need the escapes when you use the extended regex (
-E
)option. Seeman grep
, section "Basic vs Extended Regular Expressions
".