Shell – Number of backslashes needed for escaping regex backslash on the command-line

command linequotingregular expressionshell

I recently had trouble with some regex on the command-line, and
found that for matching a backslash, different numbers of
characters can be used. This number depends on the quoting used for
the regex (none, single quotes, double quotes). See the following
bash session for what I mean:

echo "#ab\\cd" > file
grep -E ab\cd file
grep -E ab\\cd file
grep -E ab\\\cd file
grep -E ab\\\\cd file
#ab\cd
grep -E ab\\\\\cd file
#ab\cd
grep -E ab\\\\\\cd file
#ab\cd
grep -E ab\\\\\\\cd file
#ab\cd
grep -E ab\\\\\\\\cd file
grep -E "ab\cd" file
grep -E "ab\\cd" file
grep -E "ab\\\cd" file
#ab\cd
grep -E "ab\\\\cd" file
#ab\cd
grep -E "ab\\\\\cd" file
#ab\cd
grep -E "ab\\\\\\cd" file
#ab\cd
grep -E "ab\\\\\\\cd" file
grep -E 'ab\cd' file
grep -E 'ab\\cd' file
#ab\cd
grep -E 'ab\\\cd' file
#ab\cd
grep -E 'ab\\\\cd' file

This means that:

  • with no quotes, I can match a backslash with 4-7 actual backslashes
  • with double quotes, I can match a backslash with 3-6 actual backslashes
  • With single quotes, I can match a backslash with 2-3 actual backslashes

I understand that one extra backslash is ignored by the shell (from
the bash man page):

"A non-quoted backslash (\) is the escape character. It preserves
the literal value of the next character that follows"

This does not apply to the single-quoted examples, because no
escaping is done in single quotes.

And one additional backslash is ignored by the grep command ("\c"
is just "c" escaped, but this is just the same as "c", because "c"
does not have a special meaning in a regex).

This explains the behaviour of the example with single quotes, but
I don't really understand the other two examples, especially why
there is a difference between non-qouted an double-quoted strings.

Again, a quote from the bash man page:

"Enclosing characters in double quotes preserves the literal value
of all characters within the quotes, with the exception of $, `, \,
and, when history expansion is enabled, !."

I tried the same with GNU awk (e.g. awk /ab\cd/{print} file),
with the same results.

Perl, however, shows different results (using e.g. perl -ne
"/ab\\cd/"\&\&print file
):

  • with no quotes, I can match a backslash with 4-5 actual backslashes
  • with double quotes, I can match a backslash with 3-4 actual backslashes
  • With single quotes, I can match a backslash with 2 actual backslashes

Can anyone explain that difference between non-quoted and double-qouted
regex strings on the command-line for grep and awk?
I'm not that interested in an explanation of Perl's behaviour, since I usually don't use Perl one-liners.

Best Answer

For the unquoted example, each \\ pair passes one backslash to grep, so 4 backslashes pass two to grep, which translates to a single backslash. 6 backslashes pass three to grep, translating to one backslash and one \c, which is equal to c. One additional backslash does not change anything, because it is translated \c -> c by the shell. Eight backslashes in the shell are four in grep, translated to two, so this does not match anymore.

For the example in double quotes, note what follows your second quote from the bash manpage:

The backslash retains its special meaning only when followed by one of the following characters: $, `, ", \, or newline.

I.e. when you give an odd number of backslashes, the sequence ends in \c, which would be equal to c in the unquoted case, but when quoted, the backslash looses its special meaning, so \c is passed to grep. That is why the range of "possible" backslashes (i.e. those that make up a pattern matching your example file) slides down by one.

Related Question