GNU grep has the -P
option for perl-style regexes, and the -o
option to print only what matches the pattern. These can be combined using look-around assertions (described under Extended Patterns in the perlre manpage) to remove part of the grep pattern from what is determined to have matched for the purposes of -o
.
$ grep -oP 'foobar \K\w+' test.txt
bash
happy
$
The \K
is the short-form (and more efficient form) of (?<=pattern)
which you use as a zero-width look-behind assertion before the text you want to output. (?=pattern)
can be used as a zero-width look-ahead assertion after the text you want to output.
For instance, if you wanted to match the word between foo
and bar
, you could use:
$ grep -oP 'foo \K\w+(?= bar)' test.txt
or (for symmetry)
$ grep -oP '(?<=foo )\w+(?= bar)' test.txt
If you want all of the numbers after UUIDs in this bucket
, you can use sed
like so:
$ zcat file.gz | sed -n 's/^.*UUIDs in this bucket //p'
8501792126581991569,8073766106536916628,4830289023695906800,6135982080116553120,8306484440313978157,9040948912536460872,8471856544054164043,5431263453539111247,7661719762428556576
6501792126581991569,8073766106536916628,4830289023695906800,6135982080116553120,8306484440313978157,9040948912536460872,8471856544054164043,5431263453539111247,7661719762428556576
Or, use perl and output the full SQL statement:
$ zcat file.gz | perl -ne 'chomp;if(s/^.*UUIDs in this bucket //){@uuids=split(/,/); $k{$_}++ for @uuids} END{ print "insert into sometable (uuid) values (" , join ",",map{qq/"$_"/} keys(%k); print ");\n"}'
insert into sometable (uuid) values ("6135982080116553120","4830289023695906800","8501792126581991569","9040948912536460872","7661719762428556576","8471856544054164043","8306484440313978157","6501792126581991569","5431263453539111247","8073766106536916628");
Or, slightly more legibly:
$ zcat file.gz |
perl -ne 'chomp;
if(s/^.*UUIDs in this bucket //){
@uuids=split(/,/);
$k{$_}++ for @uuids
}
END{
print "insert into sometable (uuid) values (" ,
join ",",map{qq/"$_"/} @uuids;
print ");\n"
}'
insert into sometable (uuid) values ("6501792126581991569","8073766106536916628","4830289023695906800","6135982080116553120","8306484440313978157","9040948912536460872","8471856544054164043","5431263453539111247","7661719762428556576");
Best Answer
Instead of using extended-regex grep (
-E
), use perl-regex grep instead (-P
), with a lookbehind and lookahead.Here,
(?<=\[)
indicates that there should be a preceding\[
, and(?=\])
indicates that there should be a following\]
, but not to include them in the match output.