Grep – Find Multiple AND Patterns in Any Order

grep

Is there any way to use grep to search for entry matching multiple patterns in any order using single condidion?

As showed in How to run grep with multiple AND patterns? for multiple patterns i can use

grep -e 'foo.*bar' -e 'bar.*foo'

but i have to write 2 conditions here, 6 conditions for 3 patterns and so on…
I want to write single condition if possible.
For finding patterns in any order you can suggest to use:

grep -e 'foo' | grep -e 'bar' # at least i do not retype patterns here

and this will work but i would like to see colored output and in this case only bar will be highlighted.

I would like to write condition as easy as

awk '/foo/ && /bar/'

if it is possible for grep (awk does not highlight results and i doubt it can be done easily).

agrep can probably do what i want, but i wonder if my default grep (2.10-1) on ubuntu 12.04 can do this.

Best Answer

If your version of grep supports PCRE (GNU grep does this with the -P or --perl-regexp option), you can use lookaheads to match multiple words in any order:

grep -P '(?=.*?word1)(?=.*?word2)(?=.*?word3)^.*$'

This won't highlight the words, though. Lookaheads are zero-length assertions, they're not part of the matching sequence.

I think your piping solution should work for that. By default, grep only colors the output when it's going to a terminal, so only the last command in the pipeline does highlighting, but you can override this with --color=always.

grep --color=always foo | grep --color=always bar

Related Solutions

Using xargs to grep multiple patterns

Yes, find ./work -print0 | xargs -0 rm will execute something like rm ./work/a "work/b c" .... You can check with echo, find ./work -print0 | xargs -0 echo rm will print the command that will be executed (except white space will be escaped appropriately, though the echo won't show that).

To get xargs to put the names in the middle, you need to add -I[string], where [string] is what you want to be replaced with the argument, in this case you'd use -I{}, e.g. <strings.txt xargs -I{} grep {} directory/*.

What you actually want to use is grep -F -f strings.txt:

-F, --fixed-strings
  Interpret PATTERN as a  list  of  fixed  strings,  separated  by
  newlines,  any  of  which is to be matched.  (-F is specified by
  POSIX.)
-f FILE, --file=FILE
  Obtain  patterns  from  FILE,  one  per  line.   The  empty file
  contains zero patterns, and therefore matches nothing.   (-f  is
  specified by POSIX.)

So grep -Ff strings.txt subdirectory/* will find all occurrences of any string in strings.txt as a literal, if you drop the -F option you can use regular expressions in the file. You could actually use grep -F "$(<strings.txt)" directory/* too. If you want to practice find, you can use the last two examples in the summary. If you want to do a recursive search instead of just the first level, you have a few options, also in the summary.

Summary:

# grep for each string individually.
<strings.txt xargs -I{} grep {} directory/*

# grep once for everything
grep -Ff strings.txt subdirectory/*
grep -F "$(<strings.txt)" directory/*

# Same, using file
find subdirectory -maxdepth 1 -type f -exec grep -Ff strings.txt {} +
find subdirectory -maxdepth 1 -type f -print0 | xargs -0 grep -Ff strings.txt

# Recursively
grep -rFf strings.txt subdirectory
find subdirectory -type f -exec grep -Ff strings.txt {} +
find subdirectory -type f -print0 | xargs -0 grep -Ff strings.txt

You may want to use the -l option to get just the name of each matching file if you don't need to see the actual line:

-l, --files-with-matches
  Suppress  normal  output;  instead  print the name of each input
  file from which output would normally have  been  printed.   The
  scanning  will  stop  on  the  first match.  (-l is specified by
  POSIX.)

Bash – counting multiple patterns in a single pass with grep

IFS=$'\n'
gzip -dc file.gz | grep -v '^>' | grep -Foe "${tri[*]}" | sort | uniq -c

But by the way, AAAC matches both AAA and AAC, but grep -o will output only one of them. Is that what you want? Also, how many occurrences of AAA in AAAAAA? 2 or 4 ([AAA]AAA, A[AAA]AA, AA[AAA]A, AAA[AAA])?

Maybe you want instead:

gzip -dc file.gz | grep -v '^>' | fold -w3 | grep -Fxe "${tri[*]}" | sort | uniq -c

That is split the lines in groups of 3 characters and count the occurrences as full lines (would find 0 occurrence of AAA in ACAAATTCG (as that's ACA AAT TCG)).

Or on the other hand:

gzip -dc file.gz | awk '
  BEGIN{n=ARGC;ARGC=0}
  !/^>/ {l = length - 2; for (i = 1; i <= l; i++) a[substr($0,i,3)]++}
  END{for (i=1;i<n;i++) printf "%s: %d\n", ARGV[i], a[ARGV[i]]}' "${tri[@]}"

(would find 4 occurrences of AAA in AAAAAA).

Best Answer

Related Solutions

Using xargs to grep multiple patterns

Bash – counting multiple patterns in a single pass with grep

Related Question