Delete n lines following a pattern (and the line matching the pattern)

find and replaceregex

How can I delete a line containing a matching pattern and the following n lines using a tool supporting regular expressions?

Said differently, how can I write a regular expression matching a line containing a matching pattern and the following n lines, so that I can replace them with nothing?

For example, if I have a matching pattern bbbb and I want to delete also the 5 lines that follows it, for the input file:

aldjflajdkl
aaaabbbbaaaa
1l;adfjl
2aldfjl
3adlflkdas
4aldfjd
5aldfkld
6dlafjlkdas

The output would be:

aldjflajdkl
6dlafjlkdas

It probably simplify things that in my specific case, it cannot be that the matching pattern (bbbb) is contained in the following 5 lines.

A solution already exists for sed, but it relies only partially on regular expressions, and uses custom replacement commands which are not portable.

Best Answer

A possible solution is:

.*<matching pattern>(.*\r?\n){<N+1>}

where N is the number of lines I want to remove after the line containing the pattern.

For the example given, this translates in:

.*bbbb(.*\r?\n){6}

That's how it looks in grepWin: grepWin screenshot
Side notes:

  • In the tab "The regex search string matches" also the 5aldfkld line is marked to be matched, indeed a scroll bar is visible on the right
  • (grepWin specific) Because of a small bug, when applying this search on files, you'll see the count of Matches increasing by 7 for each match! That's probably because the match counter counts how many lines are matched, and in this case the pattern covers 7 lines: the matched line, the following 5 lines and the line reached with the last line feed
  • (sed specific) This regex does not work for sed, which does not fully support regex and has no easy way to match/replace new lines.

The following explains how I got to the solution.

I started from:

.*bbbb.*\n.*\n.*\n.*\n.*\n.*\n

which would not work in my system. But the following would work:

.*bbbb.*\r\n.*\r\n.*\r\n.*\r\n.*\r\n.*\r\n

So, I am working in a CRLF system. However this doesn't look very pretty nor portable.

I can make it a little bit more portable (and uglier :-) ) by doing:

.*bbbb.*\r?\n.*\r?\n.*\r?\n.*\r?\n.*\r?\n.*\r?\n

(The carriage return becomes optional). It still looks ugly, but I can collect the repetitive term:

.*bbbb(.*\r?\n){6}

This guide was very handy.

Related Question