Bash RegEx on OSX Vs Linux (oddities)

bashregex

This is about Bash development and coding of portable Bash scripts that use RegEx.

Using Bash RegEx, on a Mac, I can do this:

coconut-mac$ a='bananacoconutman'; [[ "$a" =~ banana(.*?)man ]] && echo FOUND ${BASH_REMATCH[1]}
FOUND coconut

Nice. Useful in many places. Like.

When I try doing this, it fails:

coconut-mac$ a='<title>coconut</title>'; [[ "$a" =~ \<title\>(.*?)\</title\> ]] && echo FOUND ${BASH_REMATCH[1]}

The exact same command runs perfectly on the penguin:

coconut-linux$ a='<title>coconut</title>'; [[ "$a" =~ \<title\>(.*?)\</title\> ]] && echo FOUND ${BASH_REMATCH[1]}
FOUND coconut
  • Why?
  • How to fix it to make the script portable?

EDIT: On the Mac:

OS X version: 10.8.2
Bash version: 4.2.37(2)-release

on Ubuntu 12.04 LTS:

Linux kernel version: 3.2.0-29-generic-pae
Linux version: Ubuntu 12.04.1 LTS
Bash version: 4.2.24(1)-release

Best Answer

On my Mac, info bash / =~ RET says:

An additional binary operator, =~', is available, with the same precedence as==' and `!='. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex3)).

man 3 regex says:

A repetition operator (?',*', +', or bounds) cannot follow another repetition operator. A repetition operator cannot begin an expression or subexpression or follow^' or `|'.

I don't see any analogous documentation in GNU regex's man 3 regex or info regex.

If I remove the ? from your (.*?) and do the following, it works on both OSes:

$ a='<title>coconut</title>'; [[ "$a" =~ \<title\>(.*)\</title\> ]] && echo FOUND ${BASH_REMATCH[1]}
FOUND coconut
Related Question