Text Processing – Find Files with Matched Whole Lines from a File

awkgrepperltext processing

I have a file with this content:

$ cat compromised_header.txt
some unique string 1
some other unique string 2
another unique string 3

I wanted to find all files that have all the lines of above file exactly in the same order and those lines have no intermediary lines in between.

Example input file:

$ cat a-compromised-file.txt
some unique string 1
some other unique string 2
another unique string 3
unrelated line x
unrelated line y
unrelated line z

I tried using below grep:

grep -rlf compromised_header.txt dir/

But I wasn't sure it will give the expected files as it will also match this file:

some unique string 1
unrelated line x
unrelated line y
unrelated line z

Best Answer

Using an awk that supports nextfile:

NR == FNR {
  a[++n]=$0; next
}
$0 != a[c+1] && (--c || $0!=a[c+1]) {
  c=0; next
}
++c >= n {
  print FILENAME; c=0; nextfile
}

with find for recursion:

find dir -type f -exec gawk -f above.awk compromised_header.txt {} +

Or this might work:

pcregrep -rxlM "$( perl -lpe '$_=quotemeta' compromised_header.txt )" dir

Using perl to escape metacharacters because pcregrep doesn't seem to combine --fixed-strings with --multiline.

With perl in slurp mode (won't work with files that are too large to hold in memory):

find dir -type f -exec perl -n0777E 'BEGIN {$f=<>} say $ARGV if /^\Q$f/m
' compromised_header.txt {} +
Related Question