Multiline Regexp (grep, sed, awk, perl)

awkgrepregular expressionsed

I know that multiline regexp has been discussed dozens of times but I just can't get it to work with my pattern.

I'll try to explain.
I have some text files in a directory.
Example of text in a file:

LINE OF TEXT 2
LINE OF TEXT 1
LINE OF TEXT 3

LINE OF TEXT 1
LINE OF TEXT 2
LINE OF TEXT 3

LINE OF TEXT 1
LINE OF TEXT 3

LINE OF TEXT 3
LINE OF TEXT 2
LINE OF TEXT 1

LINE OF TEXT 2
LINE OF TEXT 3

I want to find "LINE OF TEXT 3" which comes after "LINE OF TEXT 2" which in turn comes after "LINE OF TEXT 1" (with no empty lines in between).

Each line must be a regexp itself, for example a line starts with "LINE" and ends with a particular number.

Note: Not all files contain that exact line sequence, so if a pattern match then don't print the pattern but just print the filename to STDOUT.

Can this be done in a one-liner regexp? So, for example, awk searches a pattern in a file and prints filename to STDOUT if a pattern found. I then can use this regexp in a combination with "find -exec".

Any mentioned tool will go (grep, awk, sed or perl).

Best Answer

You can do this with Awk by setting the "Record Separator" variable to be a regex matching at least two consecutive newline characters:

awk -v RS='\n\n+' '/1.*2.*3/' file.txt

You can also set the "Field Separator" to be a single newline character:

awk -v RS='\n\n+' -F '\n' '$1 == "LINE OF TEXT 1" && $2 == "LINE OF TEXT 2" && $3 == "LINE OF TEXT 3"' file.txt

Broken up for readability:

awk -v RS='\n\n+' -F '\n' '
  $1 == "LINE OF TEXT 1" &&
  $2 == "LINE OF TEXT 2" &&
  $3 == "LINE OF TEXT 3"
' file.txt

With your requirement of only printing the filename if a match is found, you can do this like so:

awk -v RS='\n\n+' -F '\n' '
  $1 == "LINE OF TEXT 1" &&
  $2 == "LINE OF TEXT 2" &&
  $3 == "LINE OF TEXT 3" {
    match++
  }
  END {
    if (match) {
      print FILENAME
    }
' file.txt

But considering you are talking about using find in combination with awk, I'd recommend just using Awk for the exit status and using find for the printing:

find . -type f -exec awk -v RS='\n\n+' -F '\n' '
  $1 ~ /LINE OF TEXT 1/ &&
  $2 ~ /LINE OF TEXT 2/ &&
  $3 ~ /LINE OF TEXT 3/ {
    exit 0
  }
  END { exit 1 }
' {} \; -print

That way, if you want to do something else before printing (some other find primary), you're already set up to do so.