Extract substring according to regexp with sed or grep

grepregular expressionsed

In a (BSD) UNIX environment, I would like to capture a specific substring using a regular expression.

Assume that the dmesg command output would include the following line:

pass2: <Marvell Console 1.01> Removable Processor SCSI device

I would like to capture the text between the < and > characters, like

dmesg | <sed command>

should output:

Marvell Console 1.01

However, it should not output anything if the regex does not match. Many solutions including sed -e 's/$regex/\1/ will output the whole input if no match is found, which is not what i want.

The corresponding regexp could be:
regex="^pass2\: \<(.*)\>"

How would i properly do a regex match using sed or grep? Note that the grep -P option is unavailable in my BSD UNIX distribution. The sed -E option is available, however.

Best Answer

Try this,

sed -nE 's/^pass2:.*<(.*)>.*$/\1/p'

Or POSIXly (-E has not made it to the POSIX standard yet as of 2019):

sed -n 's/^pass2:.*<\(.*\)>.*$/\1/p'

Output:

$ printf '%s\n' 'pass2: <Marvell Console 1.01> Removable Processor SCSI device' | sed -nE 's/^pass2:.*<(.*)>.*$/\1/p'
Marvell Console 1.01

This will only print the last occurrence of <...> for each line.

Related Solutions

Text Processing – Can Grep Output Only Specified Groupings That Match

GNU grep has the -P option for perl-style regexes, and the -o option to print only what matches the pattern. These can be combined using look-around assertions (described under Extended Patterns in the perlre manpage) to remove part of the grep pattern from what is determined to have matched for the purposes of -o.

$ grep -oP 'foobar \K\w+' test.txt
bash
happy
$

The \K is the short-form (and more efficient form) of (?<=pattern) which you use as a zero-width look-behind assertion before the text you want to output. (?=pattern) can be used as a zero-width look-ahead assertion after the text you want to output.

For instance, if you wanted to match the word between foo and bar, you could use:

$ grep -oP 'foo \K\w+(?= bar)' test.txt

or (for symmetry)

$ grep -oP '(?<=foo )\w+(?= bar)' test.txt

How to treat a file as a single line with grep to apply a regexp search pattern

Since a'r beat me to the sed solution, I'll just post the perl equivalent:

perl -ne 'print if/start/../end/'

It's a bit more verbose though.

Best Answer

Related Solutions

Text Processing – Can Grep Output Only Specified Groupings That Match

How to treat a file as a single line with grep to apply a regexp search pattern

Related Question