Assume, there is a simple test PHP script:
<?php
$a = ($argc == 2 ? $argv[1] : 10);
for ($i = 0; $i < $a; $i++) {
echo '.';
}
echo PHP_EOL;
Now, I do a grep or a conditional sed on the file:
grep '<' test.php
yields the two lines containing the <
. That's clear.
grep '\?' test.php
yields the two lines containing the question mark. That's clear.
grep '<\?' test.php
returns all lines – why? I expected it to output only the first line. But maybe, the <
should be escaped, which yields another unexpected output.
sed -n '/pattern/p' test.php
yields the same results.
I tried to get an answer at https://regex101.com/, but by my surprise, the website shows what I expect. Also, a quick and dirty PHP implementation of grep
yields what I expect:
<?php
if (($fh = fopen($argv[2], 'r')) !== false) {
while ($line = fgets($fh)) {
if (mb_ereg($argv[1], $line) !== false) echo $line;
}
}
My question is: What is the reasoning behind those matches in grep
and sed
?
Best Answer
grep
’s default behaviour is to interpret regular expressions as basic regular expressions (BREs). These don’t support?
as a special symbol; it’s the basic character:thus gives the result you’re expecting.
GNU
grep
treats escaped versions of symbols which have special meaning in extended regular expressions but not in BREs as special symbols, even in BREs: thus in a BRE,\?
has the same meaning as?
in an ERE. Sogrep '<\?'
matches zero or one<
, which matches everything (and highlights<
if you have colour output enabled).The same reasoning applied to
sed
.