Command-Line Bash Grep – How to Exclude String Not a Substring of Another String with Grep

bashcommand linegrep

I explain my problem on Ubuntu 16.04 with the following example: The file is:

# cat file
aaa
aaaxxx
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx

I want to display all lines which contain aaa but not in the only combination of aaaxxx. I want an output like this:

# grep SOMETHING-HERE file …
aaa
aaaxxx*aaa (second aaa is the hit)
aaa=aaaxxx (first aaa is the hit)
bbbaaaccc (aaa in any other combination but not aaaxxx)
aaaddd/aaaxxx (similar to above)

I tried things like grep -v aaaxxx file | grep aaa which results:

aaa
bbbaaaccc

or

# egrep -P '(?<!aaaxxx )aaa' file
grep: die angegebenen Suchmuster stehen in Konflikt zueinander (the pattern are in contradiction)

Is there any (simple) possibility? Of course it doesn’t need to be grep.
Thanks

Best Answer

It's straightforward using a perl-style lookahead operator - available in grep's Perl Compatible Regular Expression (PCRE) mode using the -P switch:

$ grep -P 'aaa(?!xxx)' file
aaa
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx

(bold formatting in the output indicates the matched parts highlighted by grep)


Although the zero-length lookahead is convenient, you could achieve the same output using GNU Extended Regular Expression (ERE) syntax, for example by matching aaa followed by up to 2 x characters followed by a non-x character or end-of-line i.e.

grep -E 'aaax{0,2}([^x]|$)' file

or even using GNU basic regular expression (BRE) syntax

grep 'aaax\{0,2\}\([^x]\|$\)' file

which match as

aaa
aaaxxx*aaa
aaa=aaaxxx
bbbaaaccc
aaaddd/aaaxxx
Related Question