Centos – Negative lookahead on multiple strings

centosgrepregular expression

I need to find usages of Short Open Tags in PHP files, which mean matching <? but not <?php, <?xml, or <?=. In most regex flavours that would be something like this:

 <\?(?!php|xml|=)

However, the following line is matching the unwanted <?php, <?xml, and <?= portions:

$ grep -r -E "<\?(?\!php|=|xml)" *

I've tried scores of permutations of backslashes, -P and -e flags. How does one properly use a negative lookahead in GNU grep?

CentOS 7.3 (KDE desktop), GNU grep 2.20 (the online docs are for 3.0, but I've got man locally), Nescafé Decaff (this might actually be the real problem).

Best Answer

You'll need -P for PCRE which implements the Perl (?!...) negative lookahead, and to not escape the ! in the (?!...).

-bash-4.2$ cat input
<?php
<?xml
<?=
<?okay
<?
-bash-4.2$ grep -P '<\?(?!php|xml|=)' input
<?okay
<?
-bash-4.2$ 

"<\?(?\!php|=|xml)" is incorrect as this passes (?\!...) to grep and ?\! is totally not ?! as far as the regular expression engine is concerned; if you are unsure what is being passed through to a program by the shell either write some code to inspect that:

$ perl -E 'printf "%*vd\n","\t",$ARGV[0];say join "\t",split //,$ARGV[0]' "?\!"
63  92  33
?   \   !
$ 

Or use something like strace to see what grep got:

-bash-4.2$ strace -o grep grep "?\!grep" /etc/passwd
-bash-4.2$ grep grep grep
execve("/usr/bin/grep", ["grep", "?\\!grep", "/etc/passwd"], [/* 24 vars */]) = 0
-bash-4.2$ 
Related Question