Bash – have to quote an escaped character in a regular expression for grep, but not on online regex engines

bashgrepregular expression

I'm sure some version of this question has been asked and answered before, but I've looked around and haven't found the exact answer. Perhaps someone here can help the lightbulb go on for me. I'm on a Mac with Mojave 10.14.6 and bash 3.2.57(1)-release.

I'm learning the basics of regular expressions by following along with an online tutorial, and practicing both on the online site https://regexr.com, and by using grep in bash on my local machine.

I'm practicing with a small text file (called small.txt) with three things in it:

9.00
9-00
9500

I understand that the . wildcard will match any one character at that spot. So, in the online regex engine (JavaScript) that I'm using /9.00/g will match all three strings 9.00 9-00 and 9500.

It's the same if I use grep on the command line:

~/bin $ grep 9.00 small.txt
9.00
9-00
9500

So far, so good. The tutorial says that to turn the . from a metacharacter into a literal, you have to escape it. Okay. so putting /9\.00/g into the online regex box will only match 9.00, as expected, not 9-00 nor 9500. Great.

However, if I enter that same syntax into grep on the command line, I get an unexpected result:

~/bin $ grep 9\.00 small.txt
9.00
9-00
9500

Same as before. To get grep to work, I either have to double quote the whole string:

~/bin $ grep "9\.00" small.txt
9.00

or just double quote the escaped character:

~/bin $ grep 9"\."00 small.txt
9.00

There may well be some other quoting choices that I could make that would also give me the correct result.

This is making it hard for me to wrap my head around the basics of regular expression, because, clearly, I first have to understand how grep in the shell differs from traditional regular expression syntax. It's hard enough learning all of the rules for regular expressions, but when you add in the differences between classic regular expression and the behavior of the bash shell, my head explodes.

Anyway, wondering if there was a clear explanation that will clear this up for me and set me on the path to properly learning regular expressions that I can use with grep on the command line.

(None of the courses on regular expression point out the differences between the command line version of grep with bash, and the "pure" regular expression syntax that you see on the online regex testers.) I know that there are differences between engines at the advanced level, but this seems to be something so basic, that I feel that I must be missing something.

Thanks.

Best Answer

Why? because your shell interprets some special characters, such as \ in your example.

You are running into troubles because you do not protect the string that you try to pass as argument to grep via the Shell.

Several solutions:

  • singlequoting the string,
  • doublequoting the string (with doublequoting the shell will interpret several things, such as $variables , before sending the resulting string to the command),
  • or not use quoting (which I strongly advise against) but add backslashes in the right places to prevent the shell to interpret the next characters before sending it to the command.

I recommend to protect the string via single quotes, as it keeps almost everything literraly:

grep '9\.0' #send those 4 characters to grep in a single argument

The Shell pass the singlequoted string literally.

Note: The only thing you can't include inside a single quoted shell string is a single quote (as this ends the singlequoting). To include a singlequote inside a singlequoted shell string, you need to first end the singlequoting, immediately add an escaped singlequote \' (or one between doublequotes: "'" ) and then immediately reenter the singlequoting to continue the single quoted string : for exemple to have the shell execute the command grep a'b , you could write the parameter as 'a'\''b' so that the shell sends a'b to grep: so write: grep 'a'\''b' , or grep 'a'"'"'b'

If you insist on not using quoting, you need your shell to have a \\ to have it send a \ to grep.

grep 9\\.0  # ie: a 9, a pair \\, a ., and a 0 , and the shell interprets the pair \\ into a literal \

If you use doublequotes: you need to take into account that the shell will interprets several things first ($vars, \, etc). for exemple when it sees an unescaped or unquoted \, it waits the next character to decide how to interpret it. \w is seen as a single letter w, \\ is seen as a single letter \, etc.

grep "9\\.0"  # looks here the same as not quoting at all... 
    #but doublequoting allows you to have spaces, etc, inside the string
Related Question