I'm trying to match just the text contained within the HTML tags using Bash's built-in regex function:
string='<span class="circle"> </span>foo</span></span>'
regex='<span class="circle"> </span>(.+?)</span>'
[[ $string =~ $regex ]]
echo "${BASH_REMATCH[1]}"
But the match keeps capturing foo</span>
.
The internet is so crowded with examples of sed and grep that I haven't found much documentation on Bash's own regex.
Best Answer
There is a reason why the internet is packed with alternative approaches. I can't really think of any situation where you would be forced to use bash for this. Why not use one of the tools designed for the job?
Anyway, as far as I know there is no way of doing non-greedy matches using the
=~
operator. That's because it does not use bash's internal regex engine but your system's C one as defined inman 3 regex
. This is explained inman bash
:You can, however, do more or less what you want (bearing in mind that this is really not a good way of parsing HTML files) with a slightly different regex:
The above will return
foo
as expected.