Bash – Regular expression using \\ vs using \

bashquotingregular expressionshell

Why does

grep e\\.g\\. <<< "this is an e.g. wow"

and

grep e\.g\. <<< "this is an e.g. wow"

do the same thing?

If I add a third slash, it also has the same result. BUT, once I add a fourth slash it no longer works. This has to do with a question from an old exam for a class. It asked if the one with two backslashes one would work to output the line with "e.g." I originally thought it wouldn't work, but I tried to make sure and it did. What is the explanation?

Best Answer

First, note that the single slash matches too much:

$ echo $'eegg \n e.g.' | grep e\.g\.
eegg
 e.g.

As far as Bash is concerned, an escaped period is the same as a period. Bash passes on the period to grep. For grep, a period matches anything.

Now, consider:

$ echo $'eegg \n e.g.' | grep e\\.g\\.
 e.g.
$ echo $'eegg \n e.g.' | grep e\\\.g\\\.
 e.g.
$ echo $'eegg \n e.g.' | grep e\\\\.g\\\\.
$

When Bash sees a double-slash, is reduces it to a single slash and passes that onto grep which, in the first of the three tests above, sees, as we want, a single slash before a period. Thus, this does the right thing.

With a triple slash, Bash reduces the first two to a single slash. It then sees \.. Since an escaped period has no special meaning to Bash, this is reduced to a plain period. The result is that grep sees, as we want, a slash before a period.

With four slashes, Bash reduces each pair to a single slash. Bash passes on to grep two slashes and a period. grep sees the two slashes and a period and reduces the two slashes to a single literal slash. Unless the input has a literal slash followed by any character, there are no matches.

To illustrate that last, remember that inside single-quotes, all characters are literal. Thus, given the following three input lines, the grep command matches only on the line with the literal slash in the input:

$ echo 'eegg
e.g.
e\.g\.' |  grep e\\\\.g\\\\.
e\.g\.

Summary of Bash's behavior

For Bash, the rules are

  • Two slashes are reduced to a single slash.

  • A slash in front of a normal character, like a period, is just the normal character (period).

Thus:

$ echo \. \\. \\\. \\\\.
. \. \. \\.

There is a simple way to avoid all this confusion: on the Bash command line, regular expressions should be placed in single-quotes. Inside single quotes, Bash leaves everything alone.

$ echo '\. \\. \\\. \\\\.'  # Note single-quotes
\. \\. \\\. \\\\.