It's like ?
in many other regular expression engines, and means "match zero or one of whatever came before it".
In your example, the \?
is applied to the [ -]
, meaning it tries to match a space or a minus, but that the space or minus is optional.
So any of these will match:
555 1234
555-1234
5551234
The reason it's written as \?
rather than ?
is for backwards compatibility.
The original version of grep
used a different type of regular expression called a "basic regular expression" where ?
just meant a literal question mark.
So that GNU grep could have the zero or one functionality, they added it, but had to use the \?
syntax so that scripts that used ?
still worked as expected.
Note that grep has an -E
option which makes it use the more common type of regular expression, called "extended regular expressions".
man 1 grep
:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression
(ERE, see below). (-E is specified by POSIX.)
-G, --basic-regexp
Interpret PATTERN as a basic regular expression (BRE, see below).
This is the default.
...
Repetition
A regular expression may be followed by one of several repetition operators:
? The preceding item is optional and matched at most once.
...
grep understands three different versions of regular expression syntax:
“basic,” “extended” and “perl.”
...
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and )
lose their special meaning; instead use the backslashed versions
\?, \+, \{, \|, \(, and \).
Further info:
From man 7 regex
:
A bracket expression is a list of characters enclosed in "[]". …
… To include a literal '-', make it the first or last character…. [A]ll other special characters, including '\', lose their special significance within a bracket expression.
Trying the regexp with egrep gives an error:
$ echo "username : username usergroup" | egrep "^([a-zA-Z0-9\-_]+ : [a-zA-Z0-9\-_]+) (usergroup)$"
egrep: Invalid range end
Here is a simpler version, that also gives an error:
$ echo 'hi' | egrep '[\-_]'
egrep: Invalid range end
Since \
is not special, that is a range, just like [a-z]
would be. You need to put your -
at the end, like [_-]
or:
echo "username : username usergroup" | egrep "^([a-zA-Z0-9_-]+ : [a-zA-Z0-9_-]+) (usergroup)$"
username : username usergroup
This should work regardless of your libc version (in either egrep or bash).
edit: This actually depends on your locale settings too. The manpage does warn about this:
Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them.
For example:
$ echo '\_' | LC_ALL=en_US.UTF8 egrep '[\-_]'
egrep: Invalid range end
$ echo '\_' | LC_ALL=C egrep '[\-_]'
\_
Of course, even though it didn't error, it isn't doing what you want:
$ echo '\^_' | LC_ALL=C egrep '^[\-_]+$'
\^_
It's a range, which in ASCII, includes \
, [
, ^
, and _
.
Best Answer
bash (and POSIX shells in general) do not use regular expressions in the
case
statement, rather glob patterns.There's limited support for regular expressions using the
=~
operator; see details at: http://mywiki.wooledge.org/BashGuide/Patterns,which says that
bash
uses Extended Regular Expressions (ERE).