Keep only the first line from every sequence of consecutive lines matching a pattern

awksedtext processing

If 2 or more consecutive lines contain a specific pattern then delete all matching lines and keep just the first line.

In below example when 2 or more consecutive lines contain "logical IO" then we need to delete all matching lines but keep the first line.

Input file:

select * from test1 where 1=1
testing logical IO 24
select * from test2 where condition=4
parsing logical IO 45
testing logical IO 500
handling logical IO 49
select * from test5 where 1=1
testing logical IO 24
select * from test5 where condition=78
parsing logical IO 346
testing logical IO 12

Output file:

select * from test1 where 1=1
testing logical IO 24
select * from test2 where condition=4
parsing logical IO 45
select * from test5 where 1=1
testing logical IO 24
select * from test5 where condition=78
parsing logical IO 346

Best Answer

Using awk:

awk '/logical IO/ {if (!seen) {print; seen=1}; next}; {print; seen=0}' file.txt 
  • /logical IO/ {if (!seen) {print; seen=1}; next} checks if the line contains logical IO, if found and the variable seen is false i.e. previous line does not contain logical IO, then print the line, set seen=1 and go to the next line else go to the next line as the previous line has logical IO

  • For any other line, {print; seen=0}, prints the line and the sets seen=0

Example:

$ cat file.txt 
select * from test1 where 1=1
testing logical IO 24
select * from test2 where condition=4
parsing logical IO 45
testing logical IO 500
select * from test5 where 1=1
testing logical IO 24
select * from test5 where condition=78
parsing logical IO 346
parsing logical IO 346
testing logical IO 12

$ awk '/logical IO/ {if (!seen) {print; seen=1}; next}; {print; seen=0}' file.txt 
select * from test1 where 1=1
testing logical IO 24
select * from test2 where condition=4
parsing logical IO 45
select * from test5 where 1=1
testing logical IO 24
select * from test5 where condition=78
parsing logical IO 346
Related Question