Grep – How to Find Text Between Two Strings

greptext processing

I am trying to extract a value from a long string that may change over time. So for example the string could look something like this

....../filename-1.9.0.3.tar.gz"<....

And what I want to extract is the value between filename- and .tar.gz, essentially the file version (1.9.0.3 in this case). The reason I need to do it this way is because I may later run the command and the value will be 1.9.0.6 or 2.0.0.2 or something entirely different.

How can I do this? I'm currently only using grep, but I wouldn't mind using other utilities such as sed or awk or cut or whatever. To be perfectly clear, I need to extract only the file version part of the string, since it is very long (on both sides) everything else needs to be cut out somehow.

Best Answer

With grep -P/pcregrep, using a positive look-behind and a positive look-ahead:

grep -P -o '(?<=STRING1).*?(?=STRING2)' infile

in your case replace STRING1 with filename- and STRING2 with \.tar\.gz


If you don't have access to pcregrep and/or if your grep doesn't support -P you can do this with your favourite text processing tool. Here's a portable way with ed that gives you the same output:

ed -s infile <<\IN
g/STRING1/s//\ 
&/g
v/STRING1.*STRING2/d
,s/STRING1//
,s/STRING2.*//
,p
IN

How it works: a newline is prepended to each STRING1 occurrence (so now there's at most one occurrence per line) then all lines not matching STRING1.*STRING2 are deleted; on the remaining ones we only keep what's between STRING1 and STRING2 and print the result.

Related Question