With grep -P
/pcregrep
, using a positive look-behind and a positive look-ahead:
grep -P -o '(?<=STRING1).*?(?=STRING2)' infile
in your case replace STRING1
with filename-
and STRING2
with \.tar\.gz
If you don't have access to pcregrep
and/or if your grep
doesn't support -P
you can do this with your favourite text processing tool. Here's a portable way with ed
that gives you the same output:
ed -s infile <<\IN
g/STRING1/s//\
&/g
v/STRING1.*STRING2/d
,s/STRING1//
,s/STRING2.*//
,p
IN
How it works: a newline is prepended to each STRING1
occurrence (so now there's at most one occurrence per line) then all lines not matching STRING1.*STRING2
are deleted; on the remaining ones we only keep what's between STRING1
and STRING2
and print the result.
You can always do:
tac < fileName | sed '/EndPattern/,$!d;/StartPattern/q' | tac
If your system doesn't have GNU tac
, you may be able to use tail -r
instead.
You can also do it like:
awk '
inside {
text = text $0 RS
if (/EndPattern/) inside=0
next
}
/StartPattern/ {
inside = 1
text = $0 RS
}
END {printf "%s", text}' < filename
But that means reading the whole file.
Note that it may give different results if there's another StartPattern
in between a StartPattern
and the next EndPattern
or if the last StartPattern
does not have an ending EndPattern
or if there are lines matching both StartPattern
and EndPattern
.
awk '
/StartPattern/ {
inside = 1
text = ""
}
inside {text = text $0 RS}
/EndPattern/ {inside = 0}
END {printf "%s", text}' < filename
Would make it behave more like the tac+sed+tac
approach (except for the unclosed trailing StartPattern
case).
That last one seems to be the closest to your edited requirements. To add the warning would simply be:
awk '
/StartPattern/ {
inside = 1
text = ""
}
inside {text = text $0 RS}
/EndPattern/ {inside = 0}
END {
printf "%s", text
if (inside)
print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
}' < filename
To avoid reading the whole file:
tac < filename | awk '
/StartPattern/ {
printf "%s", $0 RS text
if (!inside)
print "Warning: EOF reached without seeing the end pattern" > "/dev/stderr"
exit
}
/EndPattern/ {inside = 1; text = ""}
{text = $0 RS text}'
Portability note: for /dev/stderr
, you need either a system with such a special file (beware that on Linux if stderr is open on a seekable file that will write the text at the beginning of the file instead of the current position within the file) or an awk
implementation that emulates it like gawk
, mawk
or busybox awk
(those work around the Linux issue mentioned above).
On other systems, you can replace print ... > "/dev/stderr"
with print ... | "cat>&2"
.
Best Answer
If you have GNU grep, you can use its
-o
option to search for a regex and output only the matching part. (Other grep implementations can only show the whole line.) If there are several matches on one line, they are printed on separate lines.If you only want the digits and not the brackets, it's a little harder; you need to use a zero-width assertion: a regexp that matches the empty string, but only if it is preceded, or followed as the case may be, by a bracket. Zero-width assertions are only available in Perl syntax.
With sed, you need to turn off printing with
-n
, and match the whole line and retain only the matching part. If there are several possible matches on one line, only the last match is printed. See Extracting a regex matched with 'sed' without printing the surrounding characters for more details on using sed here.or if you only want the digits and not the brackets:
Without
grep -o
, Perl is the tool of choice here if you want something that's both simple and comprehensible. On every line (-n
), if the line contains a match for\[[0-9]*\]
, then print that match ($&
) and a newline (-l
).If you only want the digits, put parentheses in the regex to delimit a group, and print only that group.
P.S. If you only want to require one or more digits between the brackets, change
[0-9]*
to[0-9][0-9]*
, or to[0-9]+
in Perl.