I don't have any problem with [[:space:]]
. Here's a really silly little example showing the mixed-replacement of spaces and tabs:
$ echo -e 'A \t \t B' | sed 's/A[[:space:]]*B/WORKED/'
WORKED
You can also use \s
which is often preferable with big sed
strings because it's much shorter:
$ echo -e 'A \t \t B' | sed 's/A\s*B/WORKED/'
WORKED
Anyway, I think your actual problem is escaping those troublesome single quotes. I find the easiest way is to break out of the single quote string and have a double-quoted single quote and then (if needed) go back into the single quote line. Bash will automatically concatenate this all up for you.
$ echo 'This is a nice string and this is a single quote:'"'"' Nice?'
This is a nice string and this is a single quote:' Nice?
So all the space we saved with \s
is about to get destroyed by this mega-quote situation:
$ echo -e '$RELEASE \t = '"'"'1234'"'"';' |\
sed 's/$RELEASE\s*=\s*'"'"'[0-9]*'"'"'\;/REPLACEMENT/'
Of course there is an argument that (because this looks like a PHP script) that you might be able to assume that if the line starts with $RELEASE[\s=]+
you can just replace the whole line. Not always true obviously (the entire app could be one hideous line) but it makes your search and replace more palatable:
sed 's/$RELEASE[\s=]*.*/REPLACEMENT/'
And yes, general sed
usage rules apply. Don't echo into a stream-editor (like sed
) and redirect back into that file. If it works you could easily knacker the file.
Either use the -i
argument (works for sed
) or pipe into a application like sponge (which is like a delayed output):
sed -i '...' file
sed '...' file | sponge file
grep
's name comes after the g/re/p
ed
command. Its primary purpose is to print the lines that match a regexp. It's not its role to edit the content of those lines. You have sed
(the stream editor) or awk
for that.
Now, some grep
implementations, starting with GNU grep
added a -o
option to print the matched portion of each line (what is matched by the regexp, not its capture groups). You've got some grep
implementation like GNU's again (with -P
) or pcregrep
that support PCREs for their regexps.
pcregrep
actually added a -o<n>
option to print the content of a capture group. So you could do:
pcregrep -o1 -o2 --om-separator=' ' '.zoo.(\d+).*:\s+(.*)'
But here, the obvious standard solution is to use sed
:
sed -n 's/^.*\.zoo\.\([0-9]\{1,\}\).*:[[:space:]]\{1,\}/\1 /p'
Or if you want perl regexps, use perl:
perl -lne 'print "$1 $2" if /\.zoo\.(\d+).*:\s+(.*)/'
With GNU grep
, if you don't mind the matches to appear on different lines, you can do:
$ grep -Po '\.zoo\.\K\d+|:\s+\K.*' < file
2
0.45654343
Note that while \K
resets the start of the matched portion, that doesn't mean you can get away with the two parts of the alternation overlapping.
grep -Po '.zoo.(\K\d+|.: \K.)'
would not work, just like echo foobar | grep -Po 'foo|foob'
wouldn't work (at printing both foo
and foob
). foo|foob
first matches foo
and then grep
looks for potential other matches in the input after the foo
, so starting at the b
of bar
, so can't find any more after that.
Above with grep -Po '\.zoo\.\K\d+|:\s+\K.*'
, we only look for :<spaces><anything>
in the second part of the alternation. That does match in the part that is after .zoo.<digits>
but that also means it would find those :<spaces><anything>
anywhere in the input, not only when they follow .zoo.<digits>
.
There is a way to work around that though, using another PCRE special operator: \G
. \G
matches at the start of the subject. For a single match, that's equivalent to ^
, but with multiple matches (think of sed
/perl
's g
flag in s/.../.../g
) like with -o
where grep
tries to find all the matches in the line, that also matches after the end of the previous match. So if you make it:
grep -Po '\.zoo\.\K\d+|(?!^)\G.*:\s+\K.*'
Where (?!^)
is a negative look-ahead operator that means not at the beginning of the line, that \G
will only match after a previous successful (non-empty) match, so .*:\s+\K.*
will only match if it follows a previous successful match, and that can only be the .foo.<digits>
one since the other part of the alternation matches til the end of the line.
On an input like:
.zoo.1.zoo.2 tar: blah
That would output:
1
2
blah
Though. If you did not want that, you'd also want the first part of the alternation to only match at the beginning of the line. Something like
grep -Po '^.*?\.zoo\.\K\d+|(?!^)\G.*:\s+\K.*'
That still outputs 2
on an input like .zoo.2 no colon character
or .zoo.2 blah:
. Which you could work around with a look-ahead operator in the first part of the alternation, and look for at least one non-space after :<spaces>
(and also using $
to avoid issues with non-characters)
grep -Po '^.*?\.zoo\.\K\d+(?=.*:\s+\S.*$)|(?!^)\G.*:\s+\K\S.*$'
You'd probably need a few pages of comments to explain that regexp, so I would still go for the straightfoward sed
/perl
solutions...
Best Answer
How it works
-n
This tells sed not to print anything unless we explicitly ask it to.
s/.*{\(.*\)}.*/\1/p
This substitute command captures as group 1 everything between two curly braces. The whole line is replaced with group 1, denoted
\1
. Thep
at the end tells sed that, if a match was made, it should print the result.