Bash – What regular expression engine type does bash use

bashregular expression

I use RegEx Buddy to prototype and debug my regular expressions. RegEx Buddy allows me to choose between a number of different regular expression engine types (.NET, Java, Perl, GNU BRE, GNU ERE, POSIX, BRE, POSIX ERE etc).

What regular expression engine does bash use (for example in if and case statements)? I'm running Centos 5.5 32 bit and bash 3.2.25(1):

[kevin@mon01 scratch]$ bash --version
GNU bash, version 3.2.25(1)-release (i686-redhat-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.

I'm guessing it'll be GNU BRE or GNU ERE?

Best Answer

bash (and POSIX shells in general) do not use regular expressions in the case statement, rather glob patterns.

There's limited support for regular expressions using the =~ operator; see details at: http://mywiki.wooledge.org/BashGuide/Patterns,
which says that bash uses Extended Regular Expressions (ERE).

Related Solutions

What does \? mean in a regular expression

It's like ? in many other regular expression engines, and means "match zero or one of whatever came before it".

In your example, the \? is applied to the [ -], meaning it tries to match a space or a minus, but that the space or minus is optional.

So any of these will match:

555 1234
555-1234
5551234

The reason it's written as \? rather than ? is for backwards compatibility.

The original version of grep used a different type of regular expression called a "basic regular expression" where ? just meant a literal question mark.

So that GNU grep could have the zero or one functionality, they added it, but had to use the \? syntax so that scripts that used ? still worked as expected.

Note that grep has an -E option which makes it use the more common type of regular expression, called "extended regular expressions".

man 1 grep:

   -E, --extended-regexp
          Interpret PATTERN as an extended regular expression
          (ERE, see below).  (-E is specified by POSIX.)

   -G, --basic-regexp
          Interpret PATTERN as a basic regular expression (BRE, see below).
          This is the default.

...

Repetition
    A regular expression may be followed by one of several repetition operators:
    ?      The preceding item is optional and matched at most once.

...

    grep understands three different versions of regular expression syntax:
    “basic,” “extended” and “perl.”

...

Basic vs Extended Regular Expressions
    In basic regular expressions the meta-characters ?, +, {, |, (, and )
    lose their special meaning; instead use the backslashed versions
    \?, \+, \{, \|, \(, and \).

Further info:

grep -E option and egrep
GNU grep - Basic vs Extended
Regexp Syntax Summary
Regular Expression - Wikipedia
Why do some regex commands have opposite intepretations of '\' with various characters?

Bash – Regular expression in bash script

From man 7 regex:

A bracket expression is a list of characters enclosed in "[]". …
… To include a literal '-', make it the first or last character…. [A]ll other special characters, including '\', lose their special significance within a bracket expression.

Trying the regexp with egrep gives an error:

$ echo "username : username usergroup" | egrep "^([a-zA-Z0-9\-_]+ : [a-zA-Z0-9\-_]+) (usergroup)$"
egrep: Invalid range end

Here is a simpler version, that also gives an error:

$ echo 'hi' | egrep '[\-_]'
egrep: Invalid range end

Since \ is not special, that is a range, just like [a-z] would be. You need to put your - at the end, like [_-] or:

echo "username : username usergroup" | egrep "^([a-zA-Z0-9_-]+ : [a-zA-Z0-9_-]+) (usergroup)$"
username : username usergroup

This should work regardless of your libc version (in either egrep or bash).

edit: This actually depends on your locale settings too. The manpage does warn about this:

Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them.

For example:

$ echo '\_' | LC_ALL=en_US.UTF8 egrep '[\-_]'
egrep: Invalid range end
$ echo '\_' | LC_ALL=C egrep '[\-_]'
\_

Of course, even though it didn't error, it isn't doing what you want:

$ echo '\^_' | LC_ALL=C egrep '^[\-_]+$'
\^_

It's a range, which in ASCII, includes \, [, ^, and _.

Best Answer

Related Solutions

What does \? mean in a regular expression

Bash – Regular expression in bash script

Related Question