Find matches on adjacent lines

awkgrepperlsedtext processing

I want to find adjacent matching lines, e.g., if the pattern matches are

$ grep -n pattern file1 file2 file3
file1:10: ...
file2:100: ...
file2:1000: ...
file2:1001: ...
file3:1: ...
file3:123: ...

I want to find the middle two matches:

file2:1000: ...
file2:1001: ...

but not the first two and the last two.

Best Answer

I'll use the same test file as thrig:

$ cat file
a
pat 1
pat 2
b
pat 3

Here is an awk solution:

$ awk '/pat/ && last {print last; print} {last=""} /pat/{last=$0}' file
pat 1
pat 2

How it works

awk implicitly loops over every line in the file. This program uses one variable, last, which contains the last line if it matched regex pat. Otherwise, it contains the empty string.

  • /pat/ && last {print last; print}

    If pat matches this line and the previous line, last, was also a match, then print both lines.

  • {last=""}

    Replace last with an empty string

  • /pat/ {last=$0}

    If this line matches pat, then set last to this line. This way it will be available when we process the next line.

Alternative for treating >2 consecutive matches as one group

Let's consider this extended test file:

$ cat file2
a
pat 1
pat 2
b
pat 3
c
pat 4
pat 5
pat 6
d

Unlike the solution above, this code treats the three consecutive matching lines as one group to be printed:

$ awk '/pat/{f++; if (f==2) print last; if (f>=2) print; last=$0; next} {f=0}' file2
pat 1
pat 2
pat 4
pat 5
pat 6

This code uses two variables. As before, last is the previous line. In addition, f counts the number of consecutive matches. So, we print matching lines when f is 2 or larger.

Adding grep-like features

To emulate the grep output shown in the question, this version prints the filename and line number before each matching line:

$ awk 'FNR==1{f=0} /pat/{f++; if (f==2) printf "%s:%s:%s\n",FILENAME,FNR-1,last; if (f>=2) printf "%s:%s:%s\n",FILENAME,FNR,$0; last=$0; next} {f=0}' file file2
file:2:pat 1
file:3:pat 2
file2:2:pat 1
file2:3:pat 2
file2:7:pat 4
file2:8:pat 5
file2:9:pat 6

Awk's FILENAME variables provides the file's name and awk's FNR provides the line number within the file.

At the beginning of each file, FNR==1, we reset f to zero. This prevents the last line of one file from being considered consecutive with the first line of the next file.

For those who like their code spread over multiple lines, the above looks like:

awk '
    FNR==1{f=0}
    /pat/ {f++
        if (f==2) printf "%s:%s:%s\n",FILENAME,FNR-1,last
        if (f>=2) printf "%s:%s:%s\n",FILENAME,FNR,$0
        last=$0
        next
    }

    {f=0}
    ' file file2
Related Question