Bash – Forcing Bash to use Perl RegEx Engine

bashperlregular expression

As you may already know, a lot of the features modern RegEx engines support (back referencing, lookaround assertions, etc.) are not supported by Bash RegEx engine. Following is a simple Bash script I have just created to try to explain what my end goal is:

#!/bin/bash

# Make sure exactly two arguments are passed.
if [ $# -lt 2 ]
then
    echo "Usage: match [string] [pattern]"
    return
fi

variable=${1}
pattern=${2}

if [[ ${variable} =~ ${pattern} ]]
then
    echo "true"
else
    echo "false"
fi

So for instance, something like the following command will return false:

. match.sh "catfish" "(?=catfish)fish"

whereas the exact same expression will find a match when used in a Perl or a JavaScript regex tester.

Backreferences (e.g. (expr1)(expr2)[ ]\1\2) won't match as well.

I have simply come to the conclusion that my problem will only be solved when forcing bash to use a Perl-compatible RegEx engine.
Is this doable? If so, how would I go about performing the procedure?

Best Answer

Bash doesn't support a method for you to do this at this time. You're left with the following options:

Use Perl
Use grep [-P|--perl-regexp]
Use Bash functionality to code it

I think I would go with #2 and try and use grep to get what I want functionally. For back referencing you can do the following with grep:

$ echo 'BEGIN `helloworld` END' | grep -oP '(?<=BEGIN `).*(?=` END)'
helloworld

-o, --only-matching       show only the part of a line matching PATTERN
-P, --perl-regexp         PATTERN is a Perl regular expression

(?=pattern)
    is a positive look-ahead assertion
(?!pattern)
    is a negative look-ahead assertion
(?<=pattern)
    is a positive look-behind assertion
(?<!pattern)
    is a negative look-behind assertion

References

How To Use Backreference in Bash

Related Solutions

Bash – What regular expression engine type does bash use

bash (and POSIX shells in general) do not use regular expressions in the case statement, rather glob patterns.

There's limited support for regular expressions using the =~ operator; see details at: http://mywiki.wooledge.org/BashGuide/Patterns,
which says that bash uses Extended Regular Expressions (ERE).

bash grep – Why Quote Escaped Character in Regex

Why? because your shell interprets some special characters, such as \ in your example.

You are running into troubles because you do not protect the string that you try to pass as argument to grep via the Shell.

Several solutions:

singlequoting the string,
doublequoting the string (with doublequoting the shell will interpret several things, such as $variables , before sending the resulting string to the command),
or not use quoting (which I strongly advise against) but add backslashes in the right places to prevent the shell to interpret the next characters before sending it to the command.

I recommend to protect the string via single quotes, as it keeps almost everything literraly:

grep '9\.0' #send those 4 characters to grep in a single argument

The Shell pass the singlequoted string literally.

Note: The only thing you can't include inside a single quoted shell string is a single quote (as this ends the singlequoting). To include a singlequote inside a singlequoted shell string, you need to first end the singlequoting, immediately add an escaped singlequote \' (or one between doublequotes: "'" ) and then immediately reenter the singlequoting to continue the single quoted string : for exemple to have the shell execute the command grep a'b , you could write the parameter as 'a'\''b' so that the shell sends a'b to grep: so write: grep 'a'\''b' , or grep 'a'"'"'b'

If you insist on not using quoting, you need your shell to have a \\ to have it send a \ to grep.

grep 9\\.0  # ie: a 9, a pair \\, a ., and a 0 , and the shell interprets the pair \\ into a literal \

If you use doublequotes: you need to take into account that the shell will interprets several things first ($vars, \, etc). for exemple when it sees an unescaped or unquoted \, it waits the next character to decide how to interpret it. \w is seen as a single letter w, \\ is seen as a single letter \, etc.

grep "9\\.0"  # looks here the same as not quoting at all... 
    #but doublequoting allows you to have spaces, etc, inside the string

Best Answer

References

Related Solutions

Bash – What regular expression engine type does bash use

bash grep – Why Quote Escaped Character in Regex

Related Question