I have a working perl regex using grep. I am trying to understand how it works.
Here is the command command.
grep -oP '(?<=location>)[^<]+' testFile1.xml
Here are the contents of testFile1.xml
<con:location>C:/test/file1.txt</con:location></con:dataFile>/con:dataFiles></con:groupFile>
And this is the result
C:/test/file1.txt
I am trying to understand the regex, i.e. this part (?<=location>)[^<]+
Best Answer
(?<=...)
is a look-behind PCRE operator. By itself, it doesn't match anything but acts as a condition (that what's on the left matches...
).(?<=X)Y
matchesY
provided that what's on the left matchesX
. InblahYfooXYbar
, that matches the secondY
, theX
is not part of what is being matched. The(?<=X)
itself matches the zero-width (imaginary) spot just before thatY
. Here illustrated:Because with
-o
,grep
only prints the matched portion, that's a way to make it print what's after thelocation>
(here what matches[^>]+
: one or more (+
) non-<
characters ([^>]
) so everything up to (but not included) the next<
character or the end of the line provided it's not empty).Another approach is to use
\K
(in newer versions of PCRE) to reset the start of the matched portion:Note that
-P
and-o
are GNU extensions. With recent versions (8.11 or over) ofpcregrep
(anothergrep
implementation that uses PCRE), you can also do:(
-o1
prints what's captured by the 1st (here unique)(...)
)