Text Processing – Remove Lines Based on Pattern While Keeping First N Matches

awksedtext processing

I need to remove lines from a text file based on pattern but I need to keep the first n lines that match the pattern.

Input

% 1 
% 2
% 3
% 4
% 5
text1
text2
text3

output

%1
%2
text1
text2
text3

I used sed /^%/d file but it deletes all the lines starting with %, sed 3,/^%/d doesn't work either. I need to keep first n lines of the pattern and delete the rest

Best Answer

If you want to delete all lines starting with % put preserving the first two lines of input, you could do:

sed -e 1,2b -e '/^%/d'

Though the same would be more legible with awk:

awk 'NR <= 2 || !/^%/'

Or, if you're after performance:

{ head -n 2; grep -v '^%'; } < input-file

If you want to preserve the first two lines matching the pattern while they may not be the first ones of the input, awk would certainly be a better option:

awk '!/^%/ || ++n <= 2'

With sed, you could use tricks like:

sed -e '/^%/!b' -e 'x;/xx/{h;d;}' -e 's/^/x/;x'

That is, use the hold space to count the number of occurrences of the patterns matched so far. Not terribly efficient or legible.

Related Question