Linux – Search for files with more than one term (grep, awk?)

awkgreplinuxsed

I am using a command like this to find files with the word 'term' in them:

grep -l term *

But I now want to be able to find files which have two different words in them (let's call them termA and termB) – not necessarily on the same line. I want to find files with both terms in, not just files that have either term.

Now I could write a cumbersome bash script for this, but does grep, egrep, awk, sed or anything else have a tool that can help me?

Thanks in advance.

Best Answer

If your files contain no null bytes

In this case, you can use grep alone:

grep -Plz "termA.*termB|termB.*termA" *

How it works:

  • The Perl Compatible Regular Expressions termA.*termB and termB.*termA search for strings which have both terms in it.

  • The combined PCRE termA.*termB|termB.*termA matches all strings containing both terms.

  • Finally, the -z switch makes data lines end in null bytes instead of newlines.

By the way, there's no need to use -P. If you prefer to continue using POSIX Basic Regular Expressions, the syntax is similar:

grep -lz "termA.*termB\|termB.*termA" *

If your files contain null bytes

In this case, you'll need auxiliary tools:

(grep -l termA * ; grep -l termB *) | sort | uniq -d

How it works:

  • grep -l termA * ; grep -l termB * displays all files containing either of the terms. Files that contain both terms will be displayed twice.

  • sort sorts the output.

  • uniq -d only displays duplicate lines (required lines to be sorted).

Related Question