sed Command – Print Only First Pattern Match of the Line

sed

I have some data like

<td><a href="data1">abc</a> ... <a href="data2">abc</a> ... <a href="data3">abc</a>

( Would refer to above line as data in code below )

I need data1 in between the first " and " so I do

echo 'data' | sed 's/.*"\(.*\)".*/\1/'

but it returns me the last string in between " and " always, i.e in this case it would return me data3 instead instead of data1

In order to get data1, I end up doing

echo 'data' | sed 's/.*"\(.*\)".*".*".*".*".*/\1/'

How do I get data1 without this much of redundancy in sed

Best Answer

The .* in the regex pattern is greedy, it matches as long a string as it can, so the quotes that are matched will be the last ones.

Since the separator is only one character here, we can use an inverted bracket group to match anything but a quote, i.e. [^"], and then repeats of that to match a number of characters that aren't quotes.

$ echo '... "foo" ... "bar" ...' | sed 's/[^"]*"\([^"]*\)".*/\1/'
foo

Another way would be to just remove everything up to the first quote, then remove everything starting from the (new) first quote:

$ echo '... "foo" ... "bar" ...' | sed 's/^[^"]*"//; s/".*$//'
foo

In Perl regexes, the * and + specifiers can be made non-greedy by appending a question mark, so .*? would anything, but as few characters/bytes as possible.

Related Question